thr3ads.net - llvm dev - [LLVMdev] How to correlate LLVA with native ISA [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Keun Soo Yim

2008-Dec-14 09:15 UTC

[LLVMdev] How to correlate LLVA with native ISA

Thank your for reply.

 The reason why these information are needed is that I am trying to extract
the program signature (e.g., control flow) out side of the binary.
Conventional compiler technique adds extra checking code into the target
source or target IR in an invasive manner. Since code generator combines the
added code with the original one, they don't need to correlate these two
information.

  It is being implemented as an LLVM analysis pass by using *
MachineFunctionPass*. If it is MachineBasicBlock or MachineInstruction, does
it possible to get the starting address of real basic block and the exact
(runtime) address of real instruction?

  Unlike the FunctionPass, my MachineFunctionPass gets an error when it is
loaded by opt. The class has constructor, virtfn, runOnMachineFunction,
getPassName, runOnFunction, and getAnalysisUsage methods where body parts of
all methods are empty. Does any have a similar problem?

Error opening '../../src/Release/lib/SWP.so':
../../src/Release/lib/SWP.so:
undefined symbol: _ZTIN4llvm19MachineFunctionPassE

(without virtfn() definition)
Error opening '../../src/Release/lib/SWP.so':
../../src/Release/lib/SWP.so:
undefined symbol: _ZN4llvm19MachineFunctionPass6virtfnEv

 Thanks in advance!
 - Keun Soo

On Mon, Dec 8, 2008 at 2:28 PM, John Criswell <criswell at uiuc.edu>
wrote:
> Can you tell us what goal you are trying to accomplish that requires you
> to do this?  There might be better ways of doing what you want.
>
> The answer to your question probably depends on whether you're trying
to
> write a pure LLVM analysis/transform, a JIT, or can interpose at the
> static code generator.
> From working strictly with the LLVM IR, I don't believe this
is> possible.  There is no instruction that can give you the address of an
> LLVM instruction.  There are multiple reasons for this: first, it would
> allow one to write code that branches into the middle of basic blocks,
> making LLVM's analysis passes much more tedious to write.  Second,
> instructions may be expanded to multiple machine instructions or
> peephole optimized away into 0 instructions during code generation.
>
> If you're willing to work with the LLVM static code generator or JIT
> infrastructure, then things might be different.  The code generator may
> have knowledge of the correlation between LLVM IR instructions and
> native code instructions; you may be able to enhance it to get the
> information you need.
>
>
> >  Similarly, by implementing an LLVM IR-level pass, is it feasible to
> > get the runtime memory address
> >  of a LLVM IR-level variable in global area? Assume the data segment
> > base address is given.
> You should be able to find the address of anything that is link-time
> accessible: these include externally visible global variables and
> externally visible functions.  The name of a global variable is its
> location in physical memory.  Memory allocated by alloca and malloc are
> also guaranteed to be "real" memory locations; the value of the
alloca
> or malloc is the location within real memory.
>
> It's not possible, however, to get the address of an SSA virtual
> register.  The code generator is free to put these into spill locations
> on the stack or into physical machine registers; in fact, the code
> generator can put an SSA value into different registers at different
> points in the function.
>
> You can do things like writing a transform that will take selected SSA
> registers and change them into alloca'ed or malloc'ed memory (or
even
> global variables).  It will hurt performance, but it will allow you to
> get a pointer to the real memory location in which they're stored.
>
> >
> >  In the LLVM library, there are already some classes starting with
> > Machine but I was not able to find
> >  any existing methods that would give the above information.
> These are used for code generation.  Again, you may be able to do more
> fancy things at (static or dynamic) code generation time, but in pure
> LLVM IR transforms, your options are somewhat limited.
>
> -- John T.
>
> >
> >  Thanks in advance.
> >
> >  Best,
> >  Keun Soo
> >
> >
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20081214/ffeda2e4/attachment.html>

Keun Soo Yim

2008-Dec-14 18:49 UTC

head link

[LLVMdev] How to correlate LLVA with native ISA

Currently I have a trick to extract the correlation information of LLVA and
ISA. By adding an intrinsic instruction and comparing the emitted binary
with the original one. The location of machine instruction that I am
interested in is calculated relative to the intrinsic instruction. Unless I
change the original instruction with the intrinsic having a same size, this
needs many iterations as much as the number of interested instructions. Even
I endure this tedious repeating tasks, it is not highly accurate because if
the target instruction was not replaced by intrinsic, it may be optimized in
code generator. 

 Since it is possible by using intrinsic and it is the compiler which has
both IR and machine code information and generate both of them, I believe
implementing a method giving this information is possible. Thus I try to
implement that but I am out of concept of LLVM back-end. Does anyone who
could explain this?

 Thanks,

 Keun Soo 

From: Keun Soo Yim [mailto:yim2012 at gmail.com] 
Sent: Sunday, December 14, 2008 3:16 AM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] How to correlate LLVA with native ISA

 Thank your for reply. 

 The reason why these information are needed is that I am trying to extract
the program signature (e.g., control flow) out side of the binary.
Conventional compiler technique adds extra checking code into the target
source or target IR in an invasive manner. Since code generator combines the
added code with the original one, they don't need to correlate these two
information.

  It is being implemented as an LLVM analysis pass by using
MachineFunctionPass. If it is MachineBasicBlock or MachineInstruction, does
it possible to get the starting address of real basic block and the exact
(runtime) address of real instruction? 

  Unlike the FunctionPass, my MachineFunctionPass gets an error when it is
loaded by opt. The class has constructor, virtfn, runOnMachineFunction,
getPassName, runOnFunction, and getAnalysisUsage methods where body parts of
all methods are empty. Does any have a similar problem?

Error opening '../../src/Release/lib/SWP.so':
../../src/Release/lib/SWP.so:
undefined symbol: _ZTIN4llvm19MachineFunctionPassE

(without virtfn() definition)
Error opening '../../src/Release/lib/SWP.so':
../../src/Release/lib/SWP.so:
undefined symbol: _ZN4llvm19MachineFunctionPass6virtfnEv

 Thanks in advance!
 - Keun Soo

On Mon, Dec 8, 2008 at 2:28 PM, John Criswell <criswell at uiuc.edu>
wrote:

Can you tell us what goal you are trying to accomplish that requires you
to do this?  There might be better ways of doing what you want.

The answer to your question probably depends on whether you're trying to
write a pure LLVM analysis/transform, a JIT, or can interpose at the
static code generator.

 From working strictly with the LLVM IR, I don't believe this is
possible.  There is no instruction that can give you the address of an
LLVM instruction.  There are multiple reasons for this: first, it would
allow one to write code that branches into the middle of basic blocks,
making LLVM's analysis passes much more tedious to write.  Second,
instructions may be expanded to multiple machine instructions or
peephole optimized away into 0 instructions during code generation.

If you're willing to work with the LLVM static code generator or JIT
infrastructure, then things might be different.  The code generator may
have knowledge of the correlation between LLVM IR instructions and
native code instructions; you may be able to enhance it to get the
information you need.

>
>  Similarly, by implementing an LLVM IR-level pass, is it feasible to
> get the runtime memory address
>  of a LLVM IR-level variable in global area? Assume the data segment
> base address is given.
You should be able to find the address of anything that is link-time
accessible: these include externally visible global variables and
externally visible functions.  The name of a global variable is its
location in physical memory.  Memory allocated by alloca and malloc are
also guaranteed to be "real" memory locations; the value of the alloca
or malloc is the location within real memory.

It's not possible, however, to get the address of an SSA virtual
register.  The code generator is free to put these into spill locations
on the stack or into physical machine registers; in fact, the code
generator can put an SSA value into different registers at different
points in the function.

You can do things like writing a transform that will take selected SSA
registers and change them into alloca'ed or malloc'ed memory (or even
global variables).  It will hurt performance, but it will allow you to
get a pointer to the real memory location in which they're stored.

>
>  In the LLVM library, there are already some classes starting with
> Machine but I was not able to find
>  any existing methods that would give the above information.
These are used for code generation.  Again, you may be able to do more
fancy things at (static or dynamic) code generation time, but in pure
LLVM IR transforms, your options are somewhat limited.

-- John T.

>
>  Thanks in advance.
>
>  Best,
>  Keun Soo
>
>
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20081214/ac4d7b02/attachment.html>

Keun Soo Yim

2008-Dec-15 02:31 UTC

head link

[LLVMdev] intra- and inter-procedural control flow analysis

Hi,

 

 As I use LLVM more and more, I recognize its strong capability and beauty
of design. Thanks a lot!

 

 For intra-procedure and inter-procedural control flow analysis, succ_iter()
and use_begin() iterators are perfect.

 

 Then how about the link-time inter-file control flow analysis? I used
llvm-link to combine multiple bc files which is then analyzed by llc but as
it is expected this does not give inter-file dependency. 

 

 I know that LLVM supports link-time optimizers, then how about inter-file
control flow? Is it support this? Extracting this information by analyzing
target operand of every call instruction would be a solution? Thank you for
reading!

 

 - Keun Soo

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20081214/abd7c713/attachment.html>

John Criswell

2008-Dec-15 16:11 UTC

head link

[LLVMdev] How to correlate LLVA with native ISA

Keun Soo Yim wrote:>
>  Thank your for reply.I know this reply may be somewhat irrelevant because of your next post, 
but I thought I'd answer it anyway to get a better idea of what you're 
doing and to provide insight on the LLVM passes.>
>  The reason why these information are needed is that I am trying to 
> extract the program signature (e.g., control flow) out side of the 
> binary. Conventional compiler technique adds extra checking code into 
> the target source or target IR in an invasive manner. Since code 
> generator combines the added code with the original one, they don't 
> need to correlate these two information.I'm not sure I understand what you are saying.  Are you trying to create 
a signature for a program by recording which control-flow paths it takes 
during execution?

Assuming I understand what you are trying to do correctly, how else are 
you doing this besides instrumentation?  It seems to me that you're 
adding instructions to the program, regardless of whether you're doing 
it at the source level, LLVM IR, or machine code level.>
>   It is being implemented as an LLVM analysis pass by using 
> /MachineFunctionPass/. If it is MachineBasicBlock or 
> MachineInstruction, does it possible to get the starting address of 
> real basic block and the exact (runtime) address of real instruction?
I'm not sure.>
>   Unlike the FunctionPass, my MachineFunctionPass gets an error when 
> it is loaded by opt. The class has constructor, virtfn, 
> runOnMachineFunction, getPassName, runOnFunction, and getAnalysisUsage 
> methods where body parts of all methods are empty. Does any have a 
> similar problem?The opt program can only manipulate LLVM IR; it does not do any code 
generation at all.  That is why you cannot load a MachineFunctionPass 
into it.  A MachineFunctionPass (and all of the classes whose name start 
with Machine) are used for code generation, which is done by llc.

You might be able to load MachineFunctionPass'es into llc, but I'm not
sure.

-- John T.

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Dec 2008 - [LLVMdev] How to correlate LLVA with native ISA

[LLVMdev] How to correlate LLVA with native ISA

[LLVMdev] How to correlate LLVA with native ISA

[LLVMdev] intra- and inter-procedural control flow analysis

[LLVMdev] How to correlate LLVA with native ISA

Possibly Parallel Threads