thr3ads.net - llvm dev - [LLVMdev] Python bindings available. [May 2008]

If this information is useful, please help other people find it:
Share via:

Mahadevan R

2008-May-10 09:44 UTC

[LLVMdev] Python bindings available.

Hi all,

I'd like to announce the availability of Python bindings for LLVM.

It is built over llvm-c, and currently exposes enough APIs to build an
in-memory IR (and dump it!). It needs LLVM 2.3 latest and Python 2.5
(2.4 should be sufficient, but I haven't tested). Tested only on
Linux/i386.

Would love to hear your comments.

[Needless to say, it's all work in progress, but mostly it works as
expected. More tests, documentation and APIs will follow.]

It's all here: http://mdevan.nfshost.com/llvm-py.html

Thanks & Regards,
-Mahadevan.

Gordon Henriksen

2008-May-10 16:27 UTC

head link

[LLVMdev] Python bindings available.

On May 10, 2008, at 05:44, Mahadevan R wrote:
> I'd like to announce the availability of Python bindings for LLVM.
>
> It is built over llvm-c, and currently exposes enough APIs to build an
> in-memory IR (and dump it!). It needs LLVM 2.3 latest and Python 2.5
> (2.4 should be sufficient, but I haven't tested). Tested only on
> Linux/i386.
>
> Would love to hear your comments.
>
> [Needless to say, it's all work in progress, but mostly it works as
> expected. More tests, documentation and APIs will follow.]
>
> It's all here: http://mdevan.nfshost.com/llvm-py.html

Hi Mahadevan,

Very nice! The OO syntax is pleasantly succinct. :)
> Constant.string(value, dont_null_terminate) -- value is a string
> Constant.struct(consts, packed) -- a struct, consts is a list of  
> other constants, packed is boolean
I did this in Ocaml initially, but found the boolean constants pretty  
confusing to read in code. I kept asking “What's that random true  
doing there?” Therefore, the bindings expose these as const_string/ 
const_stringz and const_struct/const_packed_struct respectively. I  
figure the user can always write her own in the (very) rare cases that  
it is necessary to conditionalize such things:

     let const_string_maybez nullterm        if nullterm then const_stringz else
const_string
> Memory Buffer and Type Handles are not yet implemented.

:) Type handles in particular are very important. You can't form a  
recursive type without using them, so you can't build any sort of data  
structure.
> Builder wraps an llvm::IRBuilder object. It is created with the  
> static method new (builder = Builder.new()).
Uninitialized builders are very dangerous (they leak instructions if  
you use them), so you might want to add overloads for new in order to  
avoid boilerplate code.
> It can be positioned using the methodsposition(block, instr=None),  
> position_before(instr) and position_at_end(block).
There's an "IR navigator" concept you can implement to avoid
writing
so many overloads here. It provides a complete "position" or  
"iterator" concept. It's not entirely explicit in the C
bindings—it
would be memory-inefficient if it were. But you can build it atop them  
easily. It's useful whenever the C bindings have Before/AtEnd  
functions, and you can implement it wherever you see First/Last/Next/ 
Prev functions. The C bindings support this for functions, global  
variables, arguments, basic blocks, and instructions.

In Ocaml, we coded it up using a variant type, like (Before element |  
At_end parent). The basic operations for forward iteration are  
Parent.begin and Element.succ, which were implemented like this:

     Parent.begin        if this.first_element is null
         return At_end this
       else
         return Before this.first_element

     Element.succ        if this.next_element is null
         return At_end this.parent
       else
         return Before this.next_element

Then the user could build many IR navigation algorithms. The simplest  
one, "for each", is thus:

     for_elements(parent, callback)        pos = parent.begin
       loop
         match pos with
         | At_end _ -> break
         | Before element ->
             callback(element)
             pos = element.succ

     for_elements(parent, do_my_thing)

This representation was idiomatic in a functional language because  
it's compatible with recursion (you can translate for_elements into a  
tail recursive loop), but perhaps an enumerator class would be more  
idiomatic in Python:

     for_elements(parent, callback)        pos = parent.begin
       while pos.has_next()
         callback(pos.current)

The upshot, aside from being able to iterate the IR, was that it's  
easy to create builders anywhere with just one overload:

     // At the start or end of a BB:
     Builder.new(At_end bb)
     Builder.new(bb.begin)

     // Before or after a given instruction:
     Builder.new(Before instr)
     Builder.new(instr.succ)

This is actually more succinct than C++ because unlike  
BasicBlock::iterator, the position always knows its parent element  
(it's either parent or element.parent), so there's no need to pass it  
in separately as in builder.position(block, instr). Also, this could  
return a precise position:
> The current block is returned via the r/o property insert_block.

Finally, just as the C++ STL has reverse_iterator, it did prove  
necessary to have a separate (At_begin parent | After element) type in  
order to walk the IR backwards.

Cheers,
Gordon

Mahadevan R

2008-May-11 11:36 UTC

head link

[LLVMdev] Python bindings available.

Hi Gordon,

Thanks for your comments.
>  > Constant.string(value, dont_null_terminate) -- value is a string
>  > Constant.struct(consts, packed) -- a struct, consts is a list of
>  > other constants, packed is boolean
>
>  I did this in Ocaml initially, but found the boolean constants pretty
>  confusing to read in code. I kept asking "What's that random true
>  doing there?" Therefore, the bindings expose these as const_string/
>  const_stringz and const_struct/const_packed_struct respectively. I
OK, will do.
>  :) Type handles in particular are very important. You can't form a
>  recursive type without using them, so you can't build any sort of data
>  structure.
On it already. BTW, where can I find a good example of how to use
it?
>  Uninitialized builders are very dangerous (they leak instructions if
>  you use them), so you might want to add overloads for new in order to
>  avoid boilerplate code.
By 'uninitialized', I guess you're referring to builders that are
yet
positioned on a block/instruction? Maybe it makes more sense to
create it 'from' a block, something like:

builder = basic_block_obj.builder()

with it being positioned at the end of the block by default. But then,
your ocaml syntax is much cleaner:
>      // At the start or end of a BB:
>      Builder.new(At_end bb)
>      Builder.new(bb.begin)
>
>      // Before or after a given instruction:
>      Builder.new(Before instr)
>      Builder.new(instr.succ)
so I'll see how this can be done a bit, ah, Pythonically.
>  Finally, just as the C++ STL has reverse_iterator, it did prove
>  necessary to have a separate (At_begin parent | After element) type in
>  order to walk the IR backwards.
Well, it's possible to do:

for inst in reversed(block.instructions):
  # do stuff with inst

which will iterate backwards over the instructions of a block.

Thanks & Regards,
-Mahadevan.

Gordon Henriksen

2008-May-12 01:37 UTC

head link

[LLVMdev] Python bindings available.

On May 10, 2008, at 05:44, Mahadevan R wrote:
> I'd like to announce the availability of Python bindings for LLVM.
>
> It is built over llvm-c, and currently exposes enough APIs to build  
> an in-memory IR (and dump it!). It needs LLVM 2.3 latest and Python  
> 2.5 (2.4 should be sufficient, but I haven't tested). Tested only on  
> Linux/i386.
>
> Would love to hear your comments.
>
> [Needless to say, it's all work in progress, but mostly it works as  
> expected. More tests, documentation and APIs will follow.]

Hi Mahadevan,

One more thing I noticed that may be a problem. Automatic finalizers  
like this one are very dangerous when cooperating with the C++ object  
model:

void dtor_LLVMModuleRef(void *p)
{
     LLVMModuleRef m = (LLVMModuleRef)p;
     LLVMDisposeModule(m);
}

Consider the case where a function creates and populates a Module,  
stuffs it in an ExistingModuleProvider for the JIT, then returns the  
ModuleProvider, dropping direct reference to the Module.  
(ModuleProvider takes ownership of the Module.) I presume that your  
Python object is under the impression it owns the Module; when that  
goes out of scope, its refcount goes to zero and it invokes its dtor,  
disposing of the Module. D'oh— now the ModuleProvider has a dangling  
pointer. :) The routine LLVMModuleRef LLVMGetGlobalParent(LLVMValueRef  
Global); poses a related problem; in this case, the returned reference  
is non-owning, so you must not dtor it from Python.

The fix, of course, is providing a dispose routine and requiring the  
user to call it, since you can't know what they've done with the  
pointer. Luckily, the IR is not subject to these subtleties. None of  
your LLVMValueRef wrappers need destructors, either manual or  
automatic, because LLVMDisposeModule will destroy the contained  
objects recursively.

Builders and type handles are unlikely to ever be subject to these  
sorts of circumstances, though, so letting Python garbage collect them  
is advisable.

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080511/20d89e3a/attachment.html>

Mahadevan R

2008-May-12 06:58 UTC

head link

[LLVMdev] Python bindings available.

> Consider the case where a function creates and populates a Module, stuffs
it
> in an ExistingModuleProvider for the JIT, then returns the ModuleProvider,
> dropping direct reference to the Module. (ModuleProvider takes ownership of
> the Module.) I presume that your Python object is under the impression it
> owns the Module; when that goes out of scope, its refcount goes to zero and
> it invokes its dtor, disposing of the Module. D'oh— now the
ModuleProvider
> has a dangling pointer. :)
Ah. Good one. Would the following fix it?

1) Have ModuleProvider maintain a reference to the Module it owns,
     so that the ref count is at least 1 at any time. This is easily done.
     The only thing left is when an MP goes away, the module's dtor
     will be called first, deleting the module, then the MP's dtor will
     be called, which will try to delete the same module again.

2a) So either we can prevent the actual destruction of modules that
    are attached to MPs, or

2b) Do not do anything in the dtors of MPs (while letting the dtor of
    modules do the work)

Both options have the disadvantage of assuming the C/C++ implementation
(like MP::dtor deletes only the module and nothing else).
> The routine LLVMModuleRef
> LLVMGetGlobalParent(LLVMValueRef Global); poses a related problem; in this
> case, the returned reference is non-owning, so you must not dtor it from
> Python.
If I do this:

  m1 = Module.new()
  g1 = m1.add_global_variable(ty, "name")
  m2 = g1.module

will the LLVMModuleRef pointer returned in the last call be the
same as that of m1? If so probably we can get "g1.module" to
return the original object itself.
> The fix, of course, is providing a dispose routine and requiring the user
to
> call it, since you can't know what they've done with the pointer.
It'd be much easier to use it without an explicit destruction call.
I'd prefer to do it only if there's absolutely no other go.

Regards,
-Mahadevan.

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - May 2008 - [LLVMdev] Python bindings available.

[LLVMdev] Python bindings available.

[LLVMdev] Python bindings available.

[LLVMdev] Python bindings available.

[LLVMdev] Python bindings available.

[LLVMdev] Python bindings available.

Maybe Matching Threads