Ejjeh, Adel via llvm-dev
2018-Aug-07 16:51 UTC
[llvm-dev] Generating a loop from llvm bitcode
Hello I am developing a backend that generates OpenCL from llvm bitcode. I start with a revived fork of the original LLVM C-Backend and have been modifying it. One thing that the backend lacked was generating proper loop structures; instead it relied on labels and goto statements. Therefore, I am trying to find a way to identify the loop structure and print it out in the backend. So far, the main issue I’ve been having trouble with is identifying the loop induction variable in a straight forward manner; getCanonicalInductionVariable doesn’t always succeed, and I’ve tried using InductionDescriptor, but it fails in cases where the CFG doesn’t include a loop preheader. Also, in both cases, I couldn’t find a direct way of determining the loop exit condition. I am wondering if there are any passes or data structures that would be able to help me accomplish what I am trying to do that I might be missing? Or does the community have any other ideas as to how I should approach this issue? Any input is much appreciated. Regards Adel -- Adel Ejjeh PhD Candidate | Computer Science University of Illinois at Urbana-Champaign Siebel Center for Computer Science 201 N Goodwin Ave, Urbana, IL 61801 aejjeh at illinois.edu<mailto:aejjeh at illinois.edu> | adel.ejjeh at gmail.com<mailto:adel.ejjeh at gmail.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180807/ffe4172f/attachment-0001.html>
David Greene via llvm-dev
2018-Aug-07 18:18 UTC
[llvm-dev] Generating a loop from llvm bitcode
"Ejjeh, Adel via llvm-dev" <llvm-dev at lists.llvm.org> writes:> Hello > > I am developing a backend that generates OpenCL from llvm bitcode. I > start with a revived fork of the original LLVM C-Backend and have been > modifying it.How far do you intend to take this? Reconstructing high-level control flow from a CFG is not possible in the general case without code duplication. One can do quite a lot but the nature of low-level optimization means that at some point you'll likely need gotos, unless code expansion is not a concern. And even then, the result won't look much like the source program (though you probably don't care). Beyond that, reconstructing high-level characteristics of values (finding induction variables, getting proper array dimensions, etc.) is even more challenging. One can't reconstruct original C struct types, for example, because many of them get merged into a single LLVM type. One might be able to do a bit better with TBAA but frontends would have to save a lot more information in the IR to allow general translation. There are many things in LLVM that cannot be expressed in C. C has no "shufflevector" operation, for example. At best one can try to emit (non-portable) intrinsics which may or may not be acceptable. Or emit some kind of awful loop, I guess. The C backend was riddled with problems even beyond the above issues. If I were going to attempt something like this I don't think I'd start with that. It's not a hopeless project as long as you manage your expectations. :) -David
Ejjeh, Adel via llvm-dev
2018-Aug-07 18:57 UTC
[llvm-dev] Generating a loop from llvm bitcode
Thanks for your input David. I am aware of the concerns that you brought up, and I am aware about the issues of code duplication. I am willing to deal with duplication, but I need to have the proper loop structure. That’s why my main concern right now is to be able to identify the induction variable of the loop so that I can print out the “for …” statement. Even if I am only able to start out with basic loops. The main issue that I haven’t been able to find a straight-foward work around for is determining the loop exit condition, and that is mainly what I am trying to get direction for in my original question. It is unfortunate that the InductionDescriptor data structure stores everything you can think of about an induction variable, except it’s End value. -Adel> On Aug 7, 2018, at 1:18 PM, David Greene <dag at cray.com> wrote: > > > "Ejjeh, Adel via llvm-dev" <llvm-dev at lists.llvm.org> writes: > >> Hello >> >> I am developing a backend that generates OpenCL from llvm bitcode. I >> start with a revived fork of the original LLVM C-Backend and have been >> modifying it. > > How far do you intend to take this? Reconstructing high-level control > flow from a CFG is not possible in the general case without code > duplication. One can do quite a lot but the nature of low-level > optimization means that at some point you'll likely need gotos, unless > code expansion is not a concern. And even then, the result won't look > much like the source program (though you probably don't care). > > Beyond that, reconstructing high-level characteristics of values > (finding induction variables, getting proper array dimensions, etc.) is > even more challenging. One can't reconstruct original C struct types, > for example, because many of them get merged into a single LLVM type. > One might be able to do a bit better with TBAA but frontends would have > to save a lot more information in the IR to allow general translation. > > There are many things in LLVM that cannot be expressed in C. C has no > "shufflevector" operation, for example. At best one can try to emit > (non-portable) intrinsics which may or may not be acceptable. Or emit > some kind of awful loop, I guess. > > The C backend was riddled with problems even beyond the above issues. > If I were going to attempt something like this I don't think I'd start > with that. > > It's not a hopeless project as long as you manage your expectations. :) > > -David
Cranmer, Joshua via llvm-dev
2018-Aug-07 21:41 UTC
[llvm-dev] Generating a loop from llvm bitcode
The term for what you are trying to do in general is known as “control-flow graph structuring,” and you can find a bevy of papers on the topic. As David pointed out already, you can’t necessarily structure a generic CFG using goto, but I will caution that there are some constructs that are basically gotos for the purpose of structuring, namely break and continue (indeed, with multilevel break and continue statements, you can structure any non-irreducible graph without a goto). Figuring out which edges should be handled as gotos/breaks/continues and which edges should not is something that is not yet settled research. As for your actual goal, you might have a better time of things if you start by generating an infinite loop, with breaks for exiting edges and continues for backedges, and then try to remove those via refinement steps (which might work better to handle situations such as loop conditions involving short-circuiting operators). In the case of loops with exactly one exiting block terminating with a conditional branch, the exiting condition is exactly the branch condition (up to negation) of that basic block. In general, doing this sort of analysis safely for generic CFGs is quite difficult, and you should approach the problem from the perspective that you might only sometimes be able to get this information. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Ejjeh, Adel via llvm-dev Sent: Tuesday, August 7, 2018 12:52 To: llvm-dev at lists.llvm.org Subject: [llvm-dev] Generating a loop from llvm bitcode Hello I am developing a backend that generates OpenCL from llvm bitcode. I start with a revived fork of the original LLVM C-Backend and have been modifying it. One thing that the backend lacked was generating proper loop structures; instead it relied on labels and goto statements. Therefore, I am trying to find a way to identify the loop structure and print it out in the backend. So far, the main issue I’ve been having trouble with is identifying the loop induction variable in a straight forward manner; getCanonicalInductionVariable doesn’t always succeed, and I’ve tried using InductionDescriptor, but it fails in cases where the CFG doesn’t include a loop preheader. Also, in both cases, I couldn’t find a direct way of determining the loop exit condition. I am wondering if there are any passes or data structures that would be able to help me accomplish what I am trying to do that I might be missing? Or does the community have any other ideas as to how I should approach this issue? Any input is much appreciated. Regards Adel -- Adel Ejjeh PhD Candidate | Computer Science University of Illinois at Urbana-Champaign Siebel Center for Computer Science 201 N Goodwin Ave, Urbana, IL 61801 aejjeh at illinois.edu<mailto:aejjeh at illinois.edu> | adel.ejjeh at gmail.com<mailto:adel.ejjeh at gmail.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180807/21250e37/attachment.html>
James Courtier-Dutton via llvm-dev
2018-Aug-12 10:20 UTC
[llvm-dev] Generating a loop from llvm bitcode
On 7 August 2018 at 17:51, Ejjeh, Adel via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hello > > I am developing a backend that generates OpenCL from llvm bitcode. I start > with a revived fork of the original LLVM C-Backend and have been modifying > it. One thing that the backend lacked was generating proper loop structures; > instead it relied on labels and goto statements. Therefore, I am trying to > find a way to identify the loop structure and print it out in the backend. > So far, the main issue I’ve been having trouble with is identifying the loop > induction variable in a straight forward manner; > getCanonicalInductionVariable doesn’t always succeed, and I’ve tried using > InductionDescriptor, but it fails in cases where the CFG doesn’t include a > loop preheader. Also, in both cases, I couldn’t find a direct way of > determining the loop exit condition. I am wondering if there are any passes > or data structures that would be able to help me accomplish what I am trying > to do that I might be missing? Or does the community have any other ideas as > to how I should approach this issue? > > Any input is much appreciated. >Hi, What you are trying to do is essentially very similar to a decompiler. e.g. https://github.com/jcdutton/libbeauty A compiler does the following steps: OpenCL -> AST -> LLVM IR -> Machine Instructions -> Binary A Decompiler is: Binary -> Machine Instruction -> LLVM IR -> AST -> OpenCL The step you are trying is going from LLVM IR -> AST -> OpenCL. The LLVM IR can be thought of as the CFG. (Control Flow Graph) The AST is the Abstract Syntax Tree. The step from LLVM IR -> AST is generally referred to as “control-flow graph structuring”. This is when you take a CFG (if ... then ... goto) to a more structured representation such as for...loop, do...while. I am currently writing my own decompiler. https://github.com/jcdutton/libbeauty I tried the AST step some time ago. It is difficult to do. So, for now, I am concentrating on Binary -> LLVM IR. I will do the AST step later. But, during my AST work, I discovered patterns of CFG that are just not describable in any higher level structure even though the CLANG/LLVM compiler probably produced that CFG after some optimizations. So, when I revisit the AST step, I am going to look at which LLVM optimizations produced those strange CFG patterns, and write LLVM IR passes that "undo" them and thus create a CFG that is more easily converted in the AST structure. One trick to the AST step is first identifying the "Entry Point" to your function. Then identifying the "Exit Point" of your function. If there are multiple "Exit Points", Modify the LLVM IR to merge them all. E.g. Adding "goto exit:" This merging, makes it a lot easier for the graph algorithms to do their task better. Then use simple graph algorithms, control flow path analysis, to arrange all the basic blocks in a sensible order for the graph. E.g. Entry point at the top, and Exit point at the bottom of the page. In my "libbeauty" decompiler, I use xdot and graphviz to output pretty graph pictures of the CFG. It is very easy to spot loops using this method, and from that selecting for...loop or do...while structures. My CFG graph has nodes, and different colored links between them. Green=Branch-due-to-True-Forwards Red=Branch-due-to-False-Forwards Yellow= Branch-Backwards. If you have not done already, read up on graph theory. You might also need to write some of your own LLVM IR passes, to generally translate the LLVM IR into a form more suited for structuring and raising to OpenCL. Kind Regards James
Reasonably Related Threads
- Accessing Function analyses in a Module pass with the new pass manager
- Incorrect behavior in the LLVM dependence analyzer
- LLVM pass to optimize redundant branch conditions
- [LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
- Function attributes for memory side-effects