Hi, I am currently experimenting with LLVM to provide native code compilation services for a project of mine I call Objective- Smalltalk, and so far quite pleased with the results. I was able to JIT-compile some functions that send Objective-C messages, and now look forward to compiling full methods. I do have a couple of questions that I haven't been able to answer after looking through what I think is the available documentation: 1. Executable size Executables appear to be gargantuan, a framework that wraps the parts required for the above functionality weighs in at 13 MB fully stripped ( -x ) and at 72 MB (!) with debugging symbols. Is there any way of significantly reducing this size, at present or planned in the future? 2. Global (Function) naming It appears that I have to give 'functions' a global/module visible name in order to create them, which is a bit odd for the case of compiling methods, as their "name" is really more a function of where they get stuffed in the method table of the class in question, something I might not even know at the time I am compiling the method. Also these names seem to actually exist in the global function/symbol namespace of the running program, or at least interact with it. I currently just synthesize a dummy name from the address of the object in question, but that's really a bit of a hack. Is there some way of interacting with LLVM without having to interact with this global namespace? 3. Modules / JITs / functions As far as I can tell, I need a 'Module' in order to create a function, at least that's the only way I've been able to make it work so far, but I am not really clear why this should be the case. Of course, I also need this Module to create the JIT (or do I?). I've now made the Module (or rather my wrapper) a singleton, effectively a global, but I don't feel very comfortable about it. Also, I also remember some issues with not being able to create a second JIT later, so it seems like one module per lifetime of a process that wants to do jitting. Is this correct or am I missing something? 4. Jitted functions / ownership / memory Once a function is jitted I can get a function pointer to it and call it, that's great. Can I also find out how long it is, for example if I wanted to write an object file? All in all, the jit-result seems to be fairly opaque and hidden. Is this intentional, or is there more I am missing? Thanks, Marcel
On Mon, 9 Jan 2006, Marcel Weiher wrote:> I am currently experimenting with LLVM to provide native code > compilation services for a project of mine I call Objective-Smalltalk, > and so far quite pleased with the results. I was able to JIT-compile > some functions that send Objective-C messages, and now look forward to > compiling full methods.Cool!> I do have a couple of questions that I haven't been able to answer after > looking through what I think is the available documentation: > > 1. Executable size > > Executables appear to be gargantuan, a framework that wraps the parts > required for the above functionality weighs in at 13 MB fully stripped ( > -x ) and at 72 MB (!) with debugging symbols. Is there any way of > significantly reducing this size, at present or planned in the future?It depends on what you're building. A release build of LLVM (make ENABLE_OPTIMIZED=1, with the results in llvm/Release) is significantly smaller than a debug build. Even with that, however, the binaries are larger than they should be (5M?). Noone has spent the time to track down why this is to my knowledge.> 2. Global (Function) naming > > It appears that I have to give 'functions' a global/module visible name > in order to create them, which is a bit odd for the case of compiling > methods, as their "name" is really more a function of where they get > stuffed in the method table of the class in question, something I might > not even know at the time I am compiling the method. Also these names > seem to actually exist in the global function/symbol namespace of the > running program, or at least interact with it.You can use "" for the name. Multiple functions are allowed to have "" as a name without problem.> I currently just synthesize a dummy name from the address of the object > in question, but that's really a bit of a hack. Is there some way of > interacting with LLVM without having to interact with this global > namespace?Yup :)> 3. Modules / JITs / functions > > As far as I can tell, I need a 'Module' in order to create a function, > at least that's the only way I've been able to make it work so far, but > I am not really clear why this should be the case.Yes, Function objects must be embedded into Module objects for the LLVM code to be well formed.> Of course, I also > need this Module to create the JIT (or do I?).Yes, the JIT does need a module to know where to get code to compile from.> I've now made the Module > (or rather my wrapper) a singleton, effectively a global, but I don't > feel very comfortable about it.This should work. This of it as just a container for the LLVM code you are creating.> Also, I also remember some issues with not being able to create a second > JIT later, so it seems like one module per lifetime of a process that > wants to do jitting.I'm not sure what you mean here.> 4. Jitted functions / ownership / memory > > Once a function is jitted I can get a function pointer to it and call > it, that's great. Can I also find out how long it is, for example if I > wanted to write an object file? > All in all, the jit-result seems to be > fairly opaque and hidden. Is this intentional, or is there more I am > missing?There are ways, but there isn't an elegant public interface for this yet. For a couple of reasons, it is tricky to JIT code to memory, then wrap it up into an object file (in particular, the JIT'd code is already relocated). The start of a direct ELF writer is available in lib/CodeGen/ELFWriter.cpp, but it is not complete yet. It uses the same codegen interfaces as the JIT to do the writing. -Chris -- http://nondot.org/sabre/ http://llvm.org/
Hi Chris, thanks for your answers! [large executables]> It depends on what you're building. A release build of LLVM (make > ENABLE_OPTIMIZED=1, with the results in llvm/Release) is > significantly smaller than a debug build. Even with that, however, > the binaries are larger than they should be (5M?). Noone has spent > the time to track down why this is to my knowledge.OK, 5MB seems a lot better, I'll try doing a release build to see if that gets me to that point. Ahh...yes, one thing I like about multi-CPU machines is that they make background compiles very smooth. Anyway, the framework is now down to 5 MB, 4 MB after stripping with -x, and that compresses down to around 1.1 MB with gzip, so quite good enough for now. Lovely! [thanks for the ""-function-name trick]>> 3. Modules / JITs / functions[...]>> I've now made the Module (or rather my wrapper) a singleton, >> effectively a global, but I don't feel very comfortable about it. > > This should work. This of it as just a container for the LLVM code > you are creating.Yeah, but I really don't like globals, especially if they accumulate stuff as this one does. It would be *great* if there were a way to isolate these guys, but I haven't found one yet.>> Also, I also remember some issues with not being able to create a >> second JIT later, so it seems like one module per lifetime of a >> process that wants to do jitting. > > I'm not sure what you mean here.In my unit test code, I tried to allocate a new JIT for each test in order to isolate the tests (not really a conscious decision, more standard operating procedure). The program crashed once I tried to use the second allocated JIT. Combining this (possibly flawed) observation with the fact that a JIT has to be initialized with a module, it seems that you can only have a single module in a process (as having a second module would require a second JIT). It is quite likely that I was doing something wrong at the time, these were my very first baby steps, but from what I've gleamed it *appears* to be that LLVM sort of expects these to be pretty much singletons, or at the very least some sot of hierarchical invocation as you would see in a command line compiler, and it also expects a process to do a (big) compilation job and then exit. Is this impression correct or am I misinterpreting my initial experiences?>> 4. Jitted functions / ownership / memory >> >> Once a function is jitted I can get a function pointer to it and >> call it, that's great. Can I also find out how long it is, for >> example if I wanted to write an object file? >> All in all, the jit-result seems to be fairly opaque and hidden. >> Is this intentional, or is there more I am missing? > > There are ways, but there isn't an elegant public interface for > this yet. > For a couple of reasons, it is tricky to JIT code to memory, then > wrap it up into an object file (in particular, the JIT'd code is > already relocated).OK, writing an object file was possibly not the best example, but it would be good to be able to take control of the result and control its lifecycle. For example, imagine an IDE-type environment where you want to overwrite a particular method (and not necessarily with code coming from LLVM).> The start of a direct ELF writer is available in lib/CodeGen/ > ELFWriter.cpp, but it is not complete yet. It uses the same > codegen interfaces as the JIT to do the writing.Very cool, will have to take a look at that...though what I will need, at least initially, is Mach-O, not ELF... :-) Thanks again, Marcel -- Marcel Weiher Metaobject Software Technologies marcel at metaobject.com www.metaobject.com The simplicity of power HOM, IDEAs, MetaAd etc. 1d480c25f397c4786386135f8e8938e4