thr3ads.net - llvm dev - [llvm-dev] Decompilation and the SSA form [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Carter Cheng via llvm-dev

2016-Jan-23 15:00 UTC

[llvm-dev] Decompilation and the SSA form

Hi,

Is decompilation possible in general to the SSA form for binaries? I assume
one has to make certain assumptions about code in general to get tools like
these to work. For example if code like with dlsym or jit heap allocated
functions can be incorporated at runtime it would seem that in general it
is quite difficult to ascertain the boundaries of a basic block and insert
the correct phi functions for the predecessors since one could have jumps
from the new code into the middle of the static code. This is already
ignoring the problem of self modification.

I haven't managed to find many references on the topic but I am curious
about what sorts of assumptions are made in decompilation code recovery and
general issues theoretical issues about when it is possible to do.

Thanks in advance,

Carter,
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160123/4747a91e/attachment.html>

John Criswell via llvm-dev

2016-Jan-23 15:30 UTC

head link

[llvm-dev] Decompilation and the SSA form

Dear Carter,

You should look at tools like s2e and Revgen (from EPFL) and BAP (from 
CMU, I think).  These projects convert binary code into LLVM IR.

The issues that you raise are somewhat orthogonal to conversion to SSA 
form.  Regardless of whether one is decompiling machine code to an 
SSA-based IR or a non-SSA IR, one must deal with the challenges of 
disassembly (if doing static decompilation), self-modifying code, and 
control-flow graph reconstruction.  Once you get some sort of IR 
reconstructed, standard algorithms for converting the IR into SSA form 
apply.

Regards,

John Criswell


On 1/23/16 10:00 AM, Carter Cheng via llvm-dev wrote:> Hi,
>
> Is decompilation possible in general to the SSA form for binaries? I 
> assume one has to make certain assumptions about code in general to 
> get tools like these to work. For example if code like with dlsym or 
> jit heap allocated functions can be incorporated at runtime it would 
> seem that in general it is quite difficult to ascertain the boundaries 
> of a basic block and insert the correct phi functions for the 
> predecessors since one could have jumps from the new code into the 
> middle of the static code. This is already ignoring the problem of 
> self modification.
>
> I haven't managed to find many references on the topic but I am 
> curious about what sorts of assumptions are made in decompilation code 
> recovery and general issues theoretical issues about when it is 
> possible to do.
>
> Thanks in advance,
>
> Carter,
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160123/fcb0b01d/attachment-0001.html>

Carter Cheng via llvm-dev

2016-Jan-23 16:54 UTC

head link

[llvm-dev] Decompilation and the SSA form

Thanks for the reply. I was looking over the papers describing the
respective projects you mentioned and it seems that from my limited
understanding these systems tend  to finesse the more complicated issues.
s2e does not explicitly explore the code to generate the full code at one
time and instead analyzes an execution trace tree thus avoiding the need
for merge point analysis via phi functions (since it does symbolic
execution I guess this makes sense). Revgen does not cope with the runtime
code "closure" problem and BAP uses a custom probably lower level IR.

The main thing I don't understand about mapping machine code to SSA is that
given that the control flow graph is not guaranteed to be static with both
runtime generated code and indirect jumps how is it possible to specify the
phi functions when the standard algorithm relies on these static graph like
properties.

Thanks again,

Carter.

On Sat, Jan 23, 2016 at 11:30 PM, John Criswell <jtcriswel at gmail.com>
wrote:
> Dear Carter,
>
> You should look at tools like s2e and Revgen (from EPFL) and BAP (from
> CMU, I think).  These projects convert binary code into LLVM IR.
>
> The issues that you raise are somewhat orthogonal to conversion to SSA
> form.  Regardless of whether one is decompiling machine code to an
> SSA-based IR or a non-SSA IR, one must deal with the challenges of
> disassembly (if doing static decompilation), self-modifying code, and
> control-flow graph reconstruction.  Once you get some sort of IR
> reconstructed, standard algorithms for converting the IR into SSA form
> apply.
>
> Regards,
>
> John Criswell
>
>
>
> On 1/23/16 10:00 AM, Carter Cheng via llvm-dev wrote:
>
> Hi,
>
> Is decompilation possible in general to the SSA form for binaries? I
> assume one has to make certain assumptions about code in general to get
> tools like these to work. For example if code like with dlsym or jit heap
> allocated functions can be incorporated at runtime it would seem that in
> general it is quite difficult to ascertain the boundaries of a basic block
> and insert the correct phi functions for the predecessors since one could
> have jumps from the new code into the middle of the static code. This is
> already ignoring the problem of self modification.
>
> I haven't managed to find many references on the topic but I am curious
> about what sorts of assumptions are made in decompilation code recovery and
> general issues theoretical issues about when it is possible to do.
>
> Thanks in advance,
>
> Carter,
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> --
> John Criswell
> Assistant Professor
> Department of Computer Science, University of
Rochesterhttp://www.cs.rochester.edu/u/criswell
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160124/d061ac61/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jan 2016 - Decompilation and the SSA form

[llvm-dev] Decompilation and the SSA form

[llvm-dev] Decompilation and the SSA form

[llvm-dev] Decompilation and the SSA form

Reasonably Related Threads