thr3ads.net - llvm dev - [llvm-dev] Generating a loop from llvm bitcode [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Ejjeh, Adel via llvm-dev

2018-Aug-07 16:51 UTC

[llvm-dev] Generating a loop from llvm bitcode

Hello

I am developing a backend that generates OpenCL from llvm bitcode. I start with
a revived fork of the original LLVM C-Backend and have been modifying it. One
thing that the backend lacked was generating proper loop structures; instead it
relied on labels and goto statements. Therefore, I am trying to find a way to
identify the loop structure and print it out in the backend. So far, the main
issue I’ve been having trouble with is identifying the loop induction variable
in a straight forward manner; getCanonicalInductionVariable doesn’t always
succeed, and I’ve tried using InductionDescriptor, but it fails in cases where
the CFG doesn’t include a loop preheader. Also, in both cases, I couldn’t find a
direct way of determining the loop exit condition. I am wondering if there are
any passes or data structures that would be able to help me accomplish what I am
trying to do that I might be missing? Or does the community have any other ideas
as to how I should approach this issue?

Any input is much appreciated.

Regards
Adel
--
Adel Ejjeh
PhD Candidate | Computer Science
University of Illinois at Urbana-Champaign
Siebel Center for Computer Science
201 N Goodwin Ave, Urbana, IL 61801
aejjeh at illinois.edu<mailto:aejjeh at illinois.edu> | adel.ejjeh at
gmail.com<mailto:adel.ejjeh at gmail.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180807/ffe4172f/attachment-0001.html>

David Greene via llvm-dev

2018-Aug-07 18:18 UTC

head link

[llvm-dev] Generating a loop from llvm bitcode

"Ejjeh, Adel via llvm-dev" <llvm-dev at lists.llvm.org> writes:
> Hello 
>
> I am developing a backend that generates OpenCL from llvm bitcode. I
> start with a revived fork of the original LLVM C-Backend and have been
> modifying it. 
How far do you intend to take this?  Reconstructing high-level control
flow from a CFG is not possible in the general case without code
duplication.  One can do quite a lot but the nature of low-level
optimization means that at some point you'll likely need gotos, unless
code expansion is not a concern.  And even then, the result won't look
much like the source program (though you probably don't care).

Beyond that, reconstructing high-level characteristics of values
(finding induction variables, getting proper array dimensions, etc.) is
even more challenging.  One can't reconstruct original C struct types,
for example, because many of them get merged into a single LLVM type.
One might be able to do a bit better with TBAA but frontends would have
to save a lot more information in the IR to allow general translation.

There are many things in LLVM that cannot be expressed in C.  C has no
"shufflevector" operation, for example.  At best one can try to emit
(non-portable) intrinsics which may or may not be acceptable.  Or emit
some kind of awful loop, I guess.

The C backend was riddled with problems even beyond the above issues.
If I were going to attempt something like this I don't think I'd start
with that.

It's not a hopeless project as long as you manage your expectations.  :)

                          -David

Ejjeh, Adel via llvm-dev

2018-Aug-07 18:57 UTC

head link

[llvm-dev] Generating a loop from llvm bitcode

Thanks for your input David. I am aware of the concerns that you brought up, and
I am aware about the issues of code duplication. I am willing to deal with
duplication, but I need to have the proper loop structure. That’s why my main
concern right now is to be able to identify the induction variable of the loop
so that I can print out the “for …” statement. Even if I am only able to start
out with basic loops. The main issue that I haven’t been able to find a
straight-foward work around for is determining the loop exit condition, and that
is mainly what I am trying to get direction for in my original question. It is
unfortunate that the InductionDescriptor data structure stores everything you
can think of about an induction variable, except it’s End value.

-Adel
> On Aug 7, 2018, at 1:18 PM, David Greene <dag at cray.com> wrote:
> 
> 
> "Ejjeh, Adel via llvm-dev" <llvm-dev at lists.llvm.org>
writes:
> 
>> Hello 
>> 
>> I am developing a backend that generates OpenCL from llvm bitcode. I
>> start with a revived fork of the original LLVM C-Backend and have been
>> modifying it. 
> 
> How far do you intend to take this?  Reconstructing high-level control
> flow from a CFG is not possible in the general case without code
> duplication.  One can do quite a lot but the nature of low-level
> optimization means that at some point you'll likely need gotos, unless
> code expansion is not a concern.  And even then, the result won't look
> much like the source program (though you probably don't care).
> 
> Beyond that, reconstructing high-level characteristics of values
> (finding induction variables, getting proper array dimensions, etc.) is
> even more challenging.  One can't reconstruct original C struct types,
> for example, because many of them get merged into a single LLVM type.
> One might be able to do a bit better with TBAA but frontends would have
> to save a lot more information in the IR to allow general translation.
> 
> There are many things in LLVM that cannot be expressed in C.  C has no
> "shufflevector" operation, for example.  At best one can try to
emit
> (non-portable) intrinsics which may or may not be acceptable.  Or emit
> some kind of awful loop, I guess.
> 
> The C backend was riddled with problems even beyond the above issues.
> If I were going to attempt something like this I don't think I'd
start
> with that.
> 
> It's not a hopeless project as long as you manage your expectations. 
:)
> 
>                          -David

Cranmer, Joshua via llvm-dev

2018-Aug-07 21:41 UTC

head link

[llvm-dev] Generating a loop from llvm bitcode

The term for what you are trying to do in general is known as “control-flow
graph structuring,” and you can find a bevy of papers on the topic. As David
pointed out already, you can’t necessarily structure a generic CFG using goto,
but I will caution that there are some constructs that are basically gotos for
the purpose of structuring, namely break and continue (indeed, with multilevel
break and continue statements, you can structure any non-irreducible graph
without a goto). Figuring out which edges should be handled as
gotos/breaks/continues and which edges should not is something that is not yet
settled research.

As for your actual goal, you might have a better time of things if you start by
generating an infinite loop, with breaks for exiting edges and continues for
backedges, and then try to remove those via refinement steps (which might work
better to handle situations such as loop conditions involving short-circuiting
operators). In the case of loops with exactly one exiting block terminating with
a conditional branch, the exiting condition is exactly the branch condition (up
to negation) of that basic block. In general, doing this sort of analysis safely
for generic CFGs is quite difficult, and you should approach the problem from
the perspective that you might only sometimes be able to get this information.


From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Ejjeh,
Adel via llvm-dev
Sent: Tuesday, August 7, 2018 12:52
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] Generating a loop from llvm bitcode

Hello

I am developing a backend that generates OpenCL from llvm bitcode. I start with
a revived fork of the original LLVM C-Backend and have been modifying it. One
thing that the backend lacked was generating proper loop structures; instead it
relied on labels and goto statements. Therefore, I am trying to find a way to
identify the loop structure and print it out in the backend. So far, the main
issue I’ve been having trouble with is identifying the loop induction variable
in a straight forward manner; getCanonicalInductionVariable doesn’t always
succeed, and I’ve tried using InductionDescriptor, but it fails in cases where
the CFG doesn’t include a loop preheader. Also, in both cases, I couldn’t find a
direct way of determining the loop exit condition. I am wondering if there are
any passes or data structures that would be able to help me accomplish what I am
trying to do that I might be missing? Or does the community have any other ideas
as to how I should approach this issue?

Any input is much appreciated.

Regards
Adel
--
Adel Ejjeh
PhD Candidate | Computer Science
University of Illinois at Urbana-Champaign
Siebel Center for Computer Science
201 N Goodwin Ave, Urbana, IL 61801
aejjeh at illinois.edu<mailto:aejjeh at illinois.edu> | adel.ejjeh at
gmail.com<mailto:adel.ejjeh at gmail.com>



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180807/21250e37/attachment.html>

James Courtier-Dutton via llvm-dev

2018-Aug-12 10:20 UTC

head link

[llvm-dev] Generating a loop from llvm bitcode

On 7 August 2018 at 17:51, Ejjeh, Adel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hello
>
> I am developing a backend that generates OpenCL from llvm bitcode. I start
> with a revived fork of the original LLVM C-Backend and have been modifying
> it. One thing that the backend lacked was generating proper loop
structures;
> instead it relied on labels and goto statements. Therefore, I am trying to
> find a way to identify the loop structure and print it out in the backend.
> So far, the main issue I’ve been having trouble with is identifying the
loop
> induction variable in a straight forward manner;
> getCanonicalInductionVariable doesn’t always succeed, and I’ve tried using
> InductionDescriptor, but it fails in cases where the CFG doesn’t include a
> loop preheader. Also, in both cases, I couldn’t find a direct way of
> determining the loop exit condition. I am wondering if there are any passes
> or data structures that would be able to help me accomplish what I am
trying
> to do that I might be missing? Or does the community have any other ideas
as
> to how I should approach this issue?
>
> Any input is much appreciated.
>
Hi,

What you are trying to do is essentially very similar to a decompiler.
e.g. https://github.com/jcdutton/libbeauty
A compiler does the following steps:
OpenCL -> AST -> LLVM IR -> Machine Instructions -> Binary
A Decompiler is:
Binary -> Machine Instruction -> LLVM IR -> AST -> OpenCL

The step you are trying is going from LLVM IR -> AST -> OpenCL.
The LLVM IR can be thought of as the CFG. (Control Flow Graph)
The AST is the Abstract Syntax Tree.
The step from LLVM IR -> AST is generally referred to as “control-flow
graph structuring”.
This is when you take a CFG (if ... then ... goto)  to a more
structured representation such as for...loop,  do...while.

I am currently writing my own decompiler. https://github.com/jcdutton/libbeauty
I tried the AST step some time ago. It is difficult to do. So, for
now, I am concentrating on Binary -> LLVM IR. I will do the AST step
later.
But, during my AST work, I discovered patterns of CFG that are just
not describable in any higher level structure even though the
CLANG/LLVM compiler probably produced that CFG after some
optimizations.
So, when I revisit the AST step, I am going to look at which LLVM
optimizations produced those strange CFG patterns, and write LLVM IR
passes that "undo" them and thus create a CFG that is more easily
converted in the AST structure.
One trick to the AST step is first identifying the "Entry Point" to
your function.
Then identifying the "Exit Point" of your function.
If there are multiple "Exit Points", Modify the LLVM IR to merge them
all. E.g. Adding "goto exit:"
This merging, makes it a lot easier for the graph algorithms to do
their task better.
Then use simple graph algorithms, control flow path analysis, to
arrange all the basic blocks in a sensible order for the graph.
E.g. Entry point at the top, and Exit point at the bottom of the page.
In my "libbeauty" decompiler, I use xdot and graphviz to output pretty
graph pictures of the CFG.
It is very easy to spot loops using this method, and from that
selecting for...loop or do...while structures.
My CFG graph has nodes, and different colored links between them.
Green=Branch-due-to-True-Forwards
Red=Branch-due-to-False-Forwards
Yellow= Branch-Backwards.

If you have not done already, read up on graph theory.
You might also need to write some of your own LLVM IR passes, to
generally translate the LLVM IR into a form more suited for
structuring and raising to OpenCL.

Kind Regards

James

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Aug 2018 - Generating a loop from llvm bitcode

[llvm-dev] Generating a loop from llvm bitcode

[llvm-dev] Generating a loop from llvm bitcode

[llvm-dev] Generating a loop from llvm bitcode

[llvm-dev] Generating a loop from llvm bitcode

[llvm-dev] Generating a loop from llvm bitcode

Reasonably Related Threads