thr3ads.net - llvm dev - [LLVMdev] Lifting ASM to IR [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Roelofs

2015-Mar-13 14:47 UTC

[LLVMdev] Lifting ASM to IR

On 3/12/15 8:14 PM, Daniel Dilts wrote:> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at
gmail.com>> wrote:
>
>     > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:
>     >> Does there exist a tool that could lift a binary (assembly for
some
>     >> supported target) to LLVM IR?  If there isn't, does this
seem like
>     >> something that would be feasible?
>
>     There's plenty of variations on the idea: Revgen/S2E, Fracture,
Dagger
>     (my own), libcpu, several closed-source ones used by pentest shops,
>     some that use another representation before going to IR (say
>     llvm-qemu),  and probably others still I forgot about.
>
>     Are you interested in a specific target / use case?
>
>
> I was thinking something along the lines of lifting a binary into IR and
> spitting it out for a different processor.
This is going to be extremely difficult. Imagine for example how this 
function would be compiled:

   struct Foo {
     void *v;
     int i;
     long l;
   };

   long bar(Foo *f) {
     return f->l;
   }

If we pick a particular target, and compile this function for that, then 
'foo' will have some offset into the struct from which it loads
'l'.
This is easy because we know the sizes of the struct's members, and the 
layout rules for structs on the target.

Now turn that around: given an offset into a struct for one target, 
what's the offset into the same struct on another target? We're stuck 
because we can't reconstruct from this offset what the sizes of v and i 
are individually; all we have is their sum (and that doesn't even take 
alignment issues into account).  This is because when we're looking at 
the binary we don't know, given that offset, that the two elements in 
front of it are a void* and an int.

Now, you might think that: "well, okay we'll just use the offsets from 
one target in the other target's binaries". But that isn't going to
work
either: what if void* isn't the same size between the two targets? And 
that's just the tip of the iceberg.

TL;DR: binary translation is a very difficult problem.

Jon
>
> Or maybe a decompiling tool of some kind.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-- 
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded

Daniel Dilts

2015-Mar-13 16:15 UTC

head link

[LLVMdev] Lifting ASM to IR

On Fri, Mar 13, 2015 at 7:47 AM, Jonathan Roelofs <jonathan at
codesourcery.com> wrote:
>
>
> On 3/12/15 8:14 PM, Daniel Dilts wrote:
>
>> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
>> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at
gmail.com>> wrote:
>>
>>     > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:
>>     >> Does there exist a tool that could lift a binary (assembly
for some
>>     >> supported target) to LLVM IR?  If there isn't, does
this seem like
>>     >> something that would be feasible?
>>
>>     There's plenty of variations on the idea: Revgen/S2E, Fracture,
Dagger
>>     (my own), libcpu, several closed-source ones used by pentest shops,
>>     some that use another representation before going to IR (say
>>     llvm-qemu),  and probably others still I forgot about.
>>
>>     Are you interested in a specific target / use case?
>>
>>
>> I was thinking something along the lines of lifting a binary into IR
and
>> spitting it out for a different processor.
>>
>
> This is going to be extremely difficult. Imagine for example how this
> function would be compiled:
>
>   struct Foo {
>     void *v;
>     int i;
>     long l;
>   };
>
>   long bar(Foo *f) {
>     return f->l;
>   }
>
> If we pick a particular target, and compile this function for that, then
> 'foo' will have some offset into the struct from which it loads
'l'. This
> is easy because we know the sizes of the struct's members, and the
layout
> rules for structs on the target.
>
> Now turn that around: given an offset into a struct for one target,
what's
> the offset into the same struct on another target? We're stuck because
we
> can't reconstruct from this offset what the sizes of v and i are
> individually; all we have is their sum (and that doesn't even take
> alignment issues into account).  This is because when we're looking at
the
> binary we don't know, given that offset, that the two elements in front
of
> it are a void* and an int.
>
> Now, you might think that: "well, okay we'll just use the offsets
from one
> target in the other target's binaries". But that isn't going
to work
> either: what if void* isn't the same size between the two targets? And
> that's just the tip of the iceberg.
>
> TL;DR: binary translation is a very difficult problem.

I was afraid of something like that.  I was thinking that translating
function calls would be an issue; I didn't think about data layout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/236cfbb5/attachment.html>

yekkas

2015-Mar-13 16:26 UTC

head link

[LLVMdev] Lifting ASM to IR

On 03/13/2015 05:47 PM, Jonathan Roelofs wrote:
 > what if void* isn't the same size between the two targets?

I believe that the translated program has no other choice but to use the 
same
pointer size and data layout as the original one. Though, memory 
operations have
to be modified to map one memory space into another properly.


On 03/13/2015 05:47 PM, Jonathan Roelofs wrote:>
>
> On 3/12/15 8:14 PM, Daniel Dilts wrote:
>> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
>> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at
gmail.com>> wrote:
>>
>>     > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:
>>     >> Does there exist a tool that could lift a binary (assembly
for
>> some
>>     >> supported target) to LLVM IR?  If there isn't, does
this seem
>> like
>>     >> something that would be feasible?
>>
>>     There's plenty of variations on the idea: Revgen/S2E, Fracture,
>> Dagger
>>     (my own), libcpu, several closed-source ones used by pentest shops,
>>     some that use another representation before going to IR (say
>>     llvm-qemu),  and probably others still I forgot about.
>>
>>     Are you interested in a specific target / use case?
>>
>>
>> I was thinking something along the lines of lifting a binary into IR
and
>> spitting it out for a different processor.
>
> This is going to be extremely difficult. Imagine for example how this 
> function would be compiled:
>
>   struct Foo {
>     void *v;
>     int i;
>     long l;
>   };
>
>   long bar(Foo *f) {
>     return f->l;
>   }
>
> If we pick a particular target, and compile this function for that, 
> then 'foo' will have some offset into the struct from which it
loads
> 'l'. This is easy because we know the sizes of the struct's
members,
> and the layout rules for structs on the target.
>
> Now turn that around: given an offset into a struct for one target, 
> what's the offset into the same struct on another target? We're
stuck
> because we can't reconstruct from this offset what the sizes of v and 
> i are individually; all we have is their sum (and that doesn't even 
> take alignment issues into account).  This is because when we're 
> looking at the binary we don't know, given that offset, that the two 
> elements in front of it are a void* and an int.
>
> Now, you might think that: "well, okay we'll just use the offsets
from
> one target in the other target's binaries". But that isn't
going to
> work either: what if void* isn't the same size between the two 
> targets? And that's just the tip of the iceberg.
>
> TL;DR: binary translation is a very difficult problem.
>
>
> Jon
>
>>
>> Or maybe a decompiling tool of some kind.
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>

mats petersson

2015-Mar-13 16:58 UTC

head link

[LLVMdev] Lifting ASM to IR

Having spent my formative years (or whatever it's called) many years
back writing a disassembler that produced "re-assemblable" code (that
worked most of the time), I'm quite familiar with the fundamentals of
simply knowing "what is read-only data, what is instructions". I had
to invent several heuristics to solve for example jump-tables [switch
statement] that are PC-relative or text-strings embedded in the "code"
section  - both typically "look kind of like instructions, but aren't
really" - strings are a little less difficult than jump-tables, but
both can be very tricky - especially if they are relatively short
sequences (is "F\n" an instruction or string?)

If you then want to abstract it further, to a higher level language
(e.g LLVM IR), you have to actually undersstand the semantics of each
instruction.

Calls are typically not so bad - you can figure out from the system
calling convention what is part of the call and what isn't (assuming
the calling convention of the compiler is documented, of course)

--
Mats

On 13 March 2015 at 16:26, yekkas <yekkas at gmail.com>
wrote:> On 03/13/2015 05:47 PM, Jonathan Roelofs wrote:
>> what if void* isn't the same size between the two targets?
>
> I believe that the translated program has no other choice but to use the
> same
> pointer size and data layout as the original one. Though, memory operations
> have
> to be modified to map one memory space into another properly.
>
>
>
> On 03/13/2015 05:47 PM, Jonathan Roelofs wrote:
>>
>>
>>
>> On 3/12/15 8:14 PM, Daniel Dilts wrote:
>>>
>>> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
>>> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at
gmail.com>> wrote:
>>>
>>>     > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts
wrote:
>>>     >> Does there exist a tool that could lift a binary
(assembly for
>>> some
>>>     >> supported target) to LLVM IR?  If there isn't,
does this seem like
>>>     >> something that would be feasible?
>>>
>>>     There's plenty of variations on the idea: Revgen/S2E,
Fracture,
>>> Dagger
>>>     (my own), libcpu, several closed-source ones used by pentest
shops,
>>>     some that use another representation before going to IR (say
>>>     llvm-qemu),  and probably others still I forgot about.
>>>
>>>     Are you interested in a specific target / use case?
>>>
>>>
>>> I was thinking something along the lines of lifting a binary into
IR and
>>> spitting it out for a different processor.
>>
>>
>> This is going to be extremely difficult. Imagine for example how this
>> function would be compiled:
>>
>>   struct Foo {
>>     void *v;
>>     int i;
>>     long l;
>>   };
>>
>>   long bar(Foo *f) {
>>     return f->l;
>>   }
>>
>> If we pick a particular target, and compile this function for that,
then
>> 'foo' will have some offset into the struct from which it loads
'l'. This is
>> easy because we know the sizes of the struct's members, and the
layout rules
>> for structs on the target.
>>
>> Now turn that around: given an offset into a struct for one target,
what's
>> the offset into the same struct on another target? We're stuck
because we
>> can't reconstruct from this offset what the sizes of v and i are
>> individually; all we have is their sum (and that doesn't even take
alignment
>> issues into account).  This is because when we're looking at the
binary we
>> don't know, given that offset, that the two elements in front of it
are a
>> void* and an int.
>>
>> Now, you might think that: "well, okay we'll just use the
offsets from one
>> target in the other target's binaries". But that isn't
going to work either:
>> what if void* isn't the same size between the two targets? And
that's just
>> the tip of the iceberg.
>>
>> TL;DR: binary translation is a very difficult problem.
>>
>>
>> Jon
>>
>>>
>>> Or maybe a decompiling tool of some kind.
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm dev - Mar 2015 - [LLVMdev] Lifting ASM to IR

[LLVMdev] Lifting ASM to IR

[LLVMdev] Lifting ASM to IR

[LLVMdev] Lifting ASM to IR

[LLVMdev] Lifting ASM to IR