thr3ads.net - llvm dev - [LLVMdev] A number of newbie questions [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Marcel Weiher

2006-Jan-09 00:43 UTC

[LLVMdev] A number of newbie questions

Hi,

I am currently experimenting with LLVM to provide native code  
compilation services for a project of mine I call Objective- 
Smalltalk, and so far quite pleased with the results.  I was able to  
JIT-compile some functions that send Objective-C messages, and now  
look forward to compiling full methods.

I do have a couple of questions that I haven't been able to answer  
after looking through what I think is the available documentation:

1.	Executable size

Executables appear to be gargantuan, a framework that wraps the parts  
required for the above functionality weighs in at 13 MB fully  
stripped ( -x ) and at 72 MB (!) with debugging symbols.  Is there  
any way of significantly reducing this size, at present or planned in  
the future?

2.	Global (Function) naming

It appears that I have to give 'functions' a global/module visible  
name in order to create them, which is a bit odd for the case of  
compiling methods, as their "name" is really more a function of where
they get stuffed in the method table of the class in question,  
something I might not even know at the time I am compiling the  
method.  Also these names seem to actually exist in the global  
function/symbol namespace of the running program, or at least  
interact with it.

I currently just synthesize a dummy name from the address of the  
object in question, but that's really a bit of a hack.  Is there some  
way of interacting with LLVM without having to interact with this  
global namespace?

3.	Modules / JITs / functions

As far as I can tell, I need a 'Module' in order to create a  
function, at least that's the only way I've been able to make it work  
so far, but I am not really clear why this should be the case.  Of  
course, I also need this Module to create the JIT (or do I?).  I've  
now made the Module (or rather my wrapper) a singleton, effectively a  
global, but I don't feel very comfortable about it.   Also, I also  
remember some issues with not being able to create a second JIT  
later, so it seems like one module per lifetime of a process that  
wants to do jitting.

Is this correct or am I missing something?

4.	Jitted functions / ownership / memory

Once a function is jitted I can get a function pointer to it and call  
it, that's great.  Can I also find out how long it is, for example if  
I wanted to write an object file?  All in all, the jit-result seems  
to be fairly opaque and hidden.  Is this intentional, or is there  
more I am missing?


Thanks,

Marcel

Chris Lattner

2006-Jan-09 19:49 UTC

head link

[LLVMdev] A number of newbie questions

On Mon, 9 Jan 2006, Marcel Weiher wrote:> I am currently experimenting with LLVM to provide native code 
> compilation services for a project of mine I call Objective-Smalltalk, 
> and so far quite pleased with the results.  I was able to JIT-compile 
> some functions that send Objective-C messages, and now look forward to 
> compiling full methods.
Cool!
> I do have a couple of questions that I haven't been able to answer
after
> looking through what I think is the available documentation:
>
> 1.	Executable size
>
> Executables appear to be gargantuan, a framework that wraps the parts 
> required for the above functionality weighs in at 13 MB fully stripped ( 
> -x ) and at 72 MB (!) with debugging symbols.  Is there any way of 
> significantly reducing this size, at present or planned in the future?
It depends on what you're building.  A release build of LLVM (make 
ENABLE_OPTIMIZED=1, with the results in llvm/Release) is significantly 
smaller than a debug build.  Even with that, however, the binaries are 
larger than they should be (5M?).  Noone has spent the time to track down 
why this is to my knowledge.
> 2.	Global (Function) naming
>
> It appears that I have to give 'functions' a global/module visible
name
> in order to create them, which is a bit odd for the case of compiling 
> methods, as their "name" is really more a function of where they
get
> stuffed in the method table of the class in question, something I might 
> not even know at the time I am compiling the method.  Also these names 
> seem to actually exist in the global function/symbol namespace of the 
> running program, or at least interact with it.
You can use "" for the name.  Multiple functions are allowed to have
"" as
a name without problem.
> I currently just synthesize a dummy name from the address of the object 
> in question, but that's really a bit of a hack.  Is there some way of 
> interacting with LLVM without having to interact with this global 
> namespace?
Yup :)
> 3.	Modules / JITs / functions
>
> As far as I can tell, I need a 'Module' in order to create a
function,
> at least that's the only way I've been able to make it work so far,
but
> I am not really clear why this should be the case.
Yes, Function objects must be embedded into Module objects for the LLVM 
code to be well formed.
> Of course, I also 
> need this Module to create the JIT (or do I?).
Yes, the JIT does need a module to know where to get code to compile from.
> I've now made the Module 
> (or rather my wrapper) a singleton, effectively a global, but I don't 
> feel very comfortable about it.
This should work.  This of it as just a container for the LLVM code you 
are creating.
> Also, I also remember some issues with not being able to create a second 
> JIT later, so it seems like one module per lifetime of a process that 
> wants to do jitting.
I'm not sure what you mean here.
> 4.	Jitted functions / ownership / memory
>
> Once a function is jitted I can get a function pointer to it and call 
> it, that's great.  Can I also find out how long it is, for example if I
> wanted to write an object file?
> All in all, the jit-result seems to be 
> fairly opaque and hidden.  Is this intentional, or is there more I am 
> missing?
There are ways, but there isn't an elegant public interface for this yet.
For a couple of reasons, it is tricky to JIT code to memory, then wrap it 
up into an object file (in particular, the JIT'd code is already 
relocated).  The start of a direct ELF writer is available in 
lib/CodeGen/ELFWriter.cpp, but it is not complete yet.  It uses the same 
codegen interfaces as the JIT to do the writing.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Marcel Weiher

2006-Jan-09 22:52 UTC

head link

[LLVMdev] A number of newbie questions

Hi Chris,

thanks for your answers!

[large executables]
> It depends on what you're building.  A release build of LLVM (make  
> ENABLE_OPTIMIZED=1, with the results in llvm/Release) is  
> significantly smaller than a debug build.  Even with that, however,  
> the binaries are larger than they should be (5M?).  Noone has spent  
> the time to track down why this is to my knowledge.
OK, 5MB seems a lot better, I'll try doing a release  build to see if  
that gets me to that point.

Ahh...yes, one thing I like about multi-CPU machines is that they  
make background compiles very smooth.  Anyway, the framework is now  
down to 5 MB, 4 MB after stripping with -x, and that compresses down  
to around 1.1 MB with gzip, so quite good enough for now.  Lovely!

[thanks for the ""-function-name trick]
>> 3.	Modules / JITs / functions
[...]>> I've now made the Module (or rather my wrapper) a singleton,  
>> effectively a global, but I don't feel very comfortable about it.
>
> This should work.  This of it as just a container for the LLVM code  
> you are creating.
Yeah, but I really don't like globals, especially if they accumulate  
stuff as this one does.  It would be *great* if there were a way to  
isolate these guys, but I haven't found one yet.
>> Also, I also remember some issues with not being able to create a  
>> second JIT later, so it seems like one module per lifetime of a  
>> process that wants to do jitting.
>
> I'm not sure what you mean here.
In my unit test code, I tried to allocate a new JIT for each test in  
order to isolate the tests (not really a conscious decision, more  
standard operating procedure).  The program crashed once I tried to  
use the second allocated JIT.

Combining this (possibly flawed) observation with the fact that a JIT  
has to be initialized with a module, it seems that you can only have  
a single module in a process (as having a second module would require  
a second JIT).

It is quite likely that I was doing something wrong at the time,  
these were my very first baby steps, but from what I've gleamed it  
*appears* to be that LLVM sort of expects these to be pretty much  
singletons, or at the very least some sot of hierarchical invocation  
as you would see in a command line compiler, and it also expects a  
process to do a (big) compilation job and then exit. Is this  
impression correct or am I misinterpreting my initial experiences?

>> 4.	Jitted functions / ownership / memory
>>
>> Once a function is jitted I can get a function pointer to it and  
>> call it, that's great.  Can I also find out how long it is, for  
>> example if I wanted to write an object file?
>> All in all, the jit-result seems to be fairly opaque and hidden.   
>> Is this intentional, or is there more I am missing?
>
> There are ways, but there isn't an elegant public interface for  
> this yet.
> For a couple of reasons, it is tricky to JIT code to memory, then  
> wrap it up into an object file (in particular, the JIT'd code is  
> already relocated).
OK, writing an object file was possibly not the best example, but it  
would be good to be able to take control of the result and control  
its lifecycle.  For example, imagine an IDE-type environment where  
you want to overwrite a particular method (and not necessarily with  
code coming from LLVM).
> The start of a direct ELF writer is available in lib/CodeGen/ 
> ELFWriter.cpp, but it is not complete yet.  It uses the same  
> codegen interfaces as the JIT to do the writing.
Very cool, will have to take a look at that...though what I will  
need, at least initially, is Mach-O, not ELF... :-)

Thanks again,

Marcel

-- 
Marcel Weiher                          Metaobject Software Technologies
marcel at metaobject.com         www.metaobject.com
The simplicity of power            HOM, IDEAs, MetaAd etc.
         1d480c25f397c4786386135f8e8938e4

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Jan 2006 - [LLVMdev] A number of newbie questions

[LLVMdev] A number of newbie questions

[LLVMdev] A number of newbie questions

[LLVMdev] A number of newbie questions

Reasonably Related Threads