Hi all, I'd like to announce the availability of Python bindings for LLVM. It is built over llvm-c, and currently exposes enough APIs to build an in-memory IR (and dump it!). It needs LLVM 2.3 latest and Python 2.5 (2.4 should be sufficient, but I haven't tested). Tested only on Linux/i386. Would love to hear your comments. [Needless to say, it's all work in progress, but mostly it works as expected. More tests, documentation and APIs will follow.] It's all here: http://mdevan.nfshost.com/llvm-py.html Thanks & Regards, -Mahadevan.
On May 10, 2008, at 05:44, Mahadevan R wrote:> I'd like to announce the availability of Python bindings for LLVM. > > It is built over llvm-c, and currently exposes enough APIs to build an > in-memory IR (and dump it!). It needs LLVM 2.3 latest and Python 2.5 > (2.4 should be sufficient, but I haven't tested). Tested only on > Linux/i386. > > Would love to hear your comments. > > [Needless to say, it's all work in progress, but mostly it works as > expected. More tests, documentation and APIs will follow.] > > It's all here: http://mdevan.nfshost.com/llvm-py.htmlHi Mahadevan, Very nice! The OO syntax is pleasantly succinct. :)> Constant.string(value, dont_null_terminate) -- value is a string > Constant.struct(consts, packed) -- a struct, consts is a list of > other constants, packed is booleanI did this in Ocaml initially, but found the boolean constants pretty confusing to read in code. I kept asking “What's that random true doing there?” Therefore, the bindings expose these as const_string/ const_stringz and const_struct/const_packed_struct respectively. I figure the user can always write her own in the (very) rare cases that it is necessary to conditionalize such things: let const_string_maybez nullterm if nullterm then const_stringz else const_string> Memory Buffer and Type Handles are not yet implemented.:) Type handles in particular are very important. You can't form a recursive type without using them, so you can't build any sort of data structure.> Builder wraps an llvm::IRBuilder object. It is created with the > static method new (builder = Builder.new()).Uninitialized builders are very dangerous (they leak instructions if you use them), so you might want to add overloads for new in order to avoid boilerplate code.> It can be positioned using the methodsposition(block, instr=None), > position_before(instr) and position_at_end(block).There's an "IR navigator" concept you can implement to avoid writing so many overloads here. It provides a complete "position" or "iterator" concept. It's not entirely explicit in the C bindings—it would be memory-inefficient if it were. But you can build it atop them easily. It's useful whenever the C bindings have Before/AtEnd functions, and you can implement it wherever you see First/Last/Next/ Prev functions. The C bindings support this for functions, global variables, arguments, basic blocks, and instructions. In Ocaml, we coded it up using a variant type, like (Before element | At_end parent). The basic operations for forward iteration are Parent.begin and Element.succ, which were implemented like this: Parent.begin if this.first_element is null return At_end this else return Before this.first_element Element.succ if this.next_element is null return At_end this.parent else return Before this.next_element Then the user could build many IR navigation algorithms. The simplest one, "for each", is thus: for_elements(parent, callback) pos = parent.begin loop match pos with | At_end _ -> break | Before element -> callback(element) pos = element.succ for_elements(parent, do_my_thing) This representation was idiomatic in a functional language because it's compatible with recursion (you can translate for_elements into a tail recursive loop), but perhaps an enumerator class would be more idiomatic in Python: for_elements(parent, callback) pos = parent.begin while pos.has_next() callback(pos.current) The upshot, aside from being able to iterate the IR, was that it's easy to create builders anywhere with just one overload: // At the start or end of a BB: Builder.new(At_end bb) Builder.new(bb.begin) // Before or after a given instruction: Builder.new(Before instr) Builder.new(instr.succ) This is actually more succinct than C++ because unlike BasicBlock::iterator, the position always knows its parent element (it's either parent or element.parent), so there's no need to pass it in separately as in builder.position(block, instr). Also, this could return a precise position:> The current block is returned via the r/o property insert_block.Finally, just as the C++ STL has reverse_iterator, it did prove necessary to have a separate (At_begin parent | After element) type in order to walk the IR backwards. Cheers, Gordon
Hi Gordon, Thanks for your comments.> > Constant.string(value, dont_null_terminate) -- value is a string > > Constant.struct(consts, packed) -- a struct, consts is a list of > > other constants, packed is boolean > > I did this in Ocaml initially, but found the boolean constants pretty > confusing to read in code. I kept asking "What's that random true > doing there?" Therefore, the bindings expose these as const_string/ > const_stringz and const_struct/const_packed_struct respectively. IOK, will do.> :) Type handles in particular are very important. You can't form a > recursive type without using them, so you can't build any sort of data > structure.On it already. BTW, where can I find a good example of how to use it?> Uninitialized builders are very dangerous (they leak instructions if > you use them), so you might want to add overloads for new in order to > avoid boilerplate code.By 'uninitialized', I guess you're referring to builders that are yet positioned on a block/instruction? Maybe it makes more sense to create it 'from' a block, something like: builder = basic_block_obj.builder() with it being positioned at the end of the block by default. But then, your ocaml syntax is much cleaner:> // At the start or end of a BB: > Builder.new(At_end bb) > Builder.new(bb.begin) > > // Before or after a given instruction: > Builder.new(Before instr) > Builder.new(instr.succ)so I'll see how this can be done a bit, ah, Pythonically.> Finally, just as the C++ STL has reverse_iterator, it did prove > necessary to have a separate (At_begin parent | After element) type in > order to walk the IR backwards.Well, it's possible to do: for inst in reversed(block.instructions): # do stuff with inst which will iterate backwards over the instructions of a block. Thanks & Regards, -Mahadevan.
On May 10, 2008, at 05:44, Mahadevan R wrote:> I'd like to announce the availability of Python bindings for LLVM. > > It is built over llvm-c, and currently exposes enough APIs to build > an in-memory IR (and dump it!). It needs LLVM 2.3 latest and Python > 2.5 (2.4 should be sufficient, but I haven't tested). Tested only on > Linux/i386. > > Would love to hear your comments. > > [Needless to say, it's all work in progress, but mostly it works as > expected. More tests, documentation and APIs will follow.]Hi Mahadevan, One more thing I noticed that may be a problem. Automatic finalizers like this one are very dangerous when cooperating with the C++ object model: void dtor_LLVMModuleRef(void *p) { LLVMModuleRef m = (LLVMModuleRef)p; LLVMDisposeModule(m); } Consider the case where a function creates and populates a Module, stuffs it in an ExistingModuleProvider for the JIT, then returns the ModuleProvider, dropping direct reference to the Module. (ModuleProvider takes ownership of the Module.) I presume that your Python object is under the impression it owns the Module; when that goes out of scope, its refcount goes to zero and it invokes its dtor, disposing of the Module. D'oh— now the ModuleProvider has a dangling pointer. :) The routine LLVMModuleRef LLVMGetGlobalParent(LLVMValueRef Global); poses a related problem; in this case, the returned reference is non-owning, so you must not dtor it from Python. The fix, of course, is providing a dispose routine and requiring the user to call it, since you can't know what they've done with the pointer. Luckily, the IR is not subject to these subtleties. None of your LLVMValueRef wrappers need destructors, either manual or automatic, because LLVMDisposeModule will destroy the contained objects recursively. Builders and type handles are unlikely to ever be subject to these sorts of circumstances, though, so letting Python garbage collect them is advisable. — Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080511/20d89e3a/attachment.html>
> Consider the case where a function creates and populates a Module, stuffs it > in an ExistingModuleProvider for the JIT, then returns the ModuleProvider, > dropping direct reference to the Module. (ModuleProvider takes ownership of > the Module.) I presume that your Python object is under the impression it > owns the Module; when that goes out of scope, its refcount goes to zero and > it invokes its dtor, disposing of the Module. D'oh— now the ModuleProvider > has a dangling pointer. :)Ah. Good one. Would the following fix it? 1) Have ModuleProvider maintain a reference to the Module it owns, so that the ref count is at least 1 at any time. This is easily done. The only thing left is when an MP goes away, the module's dtor will be called first, deleting the module, then the MP's dtor will be called, which will try to delete the same module again. 2a) So either we can prevent the actual destruction of modules that are attached to MPs, or 2b) Do not do anything in the dtors of MPs (while letting the dtor of modules do the work) Both options have the disadvantage of assuming the C/C++ implementation (like MP::dtor deletes only the module and nothing else).> The routine LLVMModuleRef > LLVMGetGlobalParent(LLVMValueRef Global); poses a related problem; in this > case, the returned reference is non-owning, so you must not dtor it from > Python.If I do this: m1 = Module.new() g1 = m1.add_global_variable(ty, "name") m2 = g1.module will the LLVMModuleRef pointer returned in the last call be the same as that of m1? If so probably we can get "g1.module" to return the original object itself.> The fix, of course, is providing a dispose routine and requiring the user to > call it, since you can't know what they've done with the pointer.It'd be much easier to use it without an explicit destruction call. I'd prefer to do it only if there's absolutely no other go. Regards, -Mahadevan.