Hello all, Here is my GSoC 2012 proposal: Python bindings for LLVM. Any feedback are welcome! *Title: Python bindings for LLVM* *Abstract: * llvm-py provides Python bindings for LLVM. The latest llvm-py supports bindings with Python 2.x version for LLVM 2.x. This project is to improve llvm-py to make it compatible with both Python 2.x and Python 3 for LLVM 3. *Motivation* LLVM is used as a static and dynamic JIT backends for many platforms. It uses module-design idea and provides extensive optimization support. llvm-py provides Python bindings for LLVM [1]. It began in 2008, which aims to expose enough of LLVM APIs to implement a compiler backend in pure Python. The latest llvm-py works only with LLVM 2.x, not LLVM 3. Since LLVM 3 has several major changes, especially the internal API changes, it is necessary to improve llvm-py to work with LLVM 3. Also current llvm-py only supports Python 2.x version, but not Python 3. By supporting Python 3, it can make llvm-py more complete and thus LLVM can be used by more users, which helps in its development. So this project is to finish the two tasks: make llvm-py work with LLVM3 and add Python 3 support *Project Detail* Before writing the proposal, I took a look at llvm-py source code, and had a basic understanding how it works. I wrote a simple document to analysis how it is implemented. (please see the appendix at the end of this proposal). In this section, I list some detail that related to this project. It includes details about working with LLVM 3 and details about Python 3 support. *1. Working with LLVM 3* There are some internal API changes in LLVM 3. So the code of llvm-py should be changed to consistent with these modified API. a. IR Type system. IR type system is reimplemented LLVM 3. For instance, * OpaqueType* are gone. Such type should also be removed in llvm-py. b. Value class. Two new sub classes of Value are added: *ConstantDataArray*, an array constant *ConstantDataVector*, a vector constant. llvm-py should contain them. c. Instruction class. Four new sub classes of Instruction are added: *FenceInst*, an instruction for ordering other memory operations; *AtomicCmpXchgInst*, an instruction that atomically checks and exchanges values in a memory location; *AtomicRMWInst*, an instruction that atomically reads a memory location, combines it with another value and store the result back. *LandingPadInst *, an instruction that hold the necessary information to generate correct exception handling. llvm-py should support them. d. Passes. Some passes are removed, for instance, *LowerSetJmp* pass. So the API that is corresponding to them such as LLVMAddLowerSetJmpPass, should also be removed in llvm-py. e. PHINode. Two new functions are added in PHINode class: *block_begin* and *block_end*. The list of incoming BasicBlocks can be accessed with these functions. At the same time, reserveOperandSpace function is removed so when creating a PHINode, an extra argument is needed to specify how many operands to reserve space. When making llvm-py work with LLVM 3.0, we should focus on these changes. What I list above may not be complete. I will cover more changes during the project. *2. Python 3 support* When adding support for Python 3, we also should pay attention to the C API changes between Python 2.x and Python 3. Here I list some of them. 1. Extension module initialization and finalization (PEP 3121) [2] In Python 3, the module initialization routines should look like this: *PyObject *PyInit_<modulename>()* When creating a module, a struct PyModuleDef should be passed as a parameter. 2. Making PyObject_HEAD conform to standard C (PEP 3123) [3] Some macros are added, for instance, *PY_TYPE, PY_REFCNT,PY_SIZE*. So a code block *func->ob_type->tp_name* in Python 2.x should be replaced with * PY_TYPE(func)->ty_nam*e in Python 3. 3. Byte vectors and String/Unicode Unification (PEP 0332) [4] The *str* type and its accompanying support functions are gone and is replaced with *byte* type. When supporting Python 3 in llvm-py, we should focus on these C API changes. *Timeline* Before the coding period starts, I will analysis llvm-py source code deeply, read LLVM 3 related documentation and code to speed up the project. The coding period is divided into two stages: before midterm evaluation, I would port llvm-py to LLVM 3. After the midterm, I would add Python 3 support on llvm-py. May 21 ~ May 27 Support IR Type System for LLVM 3 May 28 ~ June 3 Support new Value sub classes and instruction sub classes June 4 ~ June 10 Deal with Pass Framework June 11 ~ June 17 Improve PHINode class support. June 18 ~ June 24 Deal with other features, such as intrinsics. June 25 ~ July 1 Test and make LLVM 3 support in good shape. July 2~ July 8 Document for LLVM 3support for llvm-py July 9 ~July 15 Midterm evaluation. July 16~ July 22 Adding Python 3 support, make it basically work July 23~ July 29 Debug and improve Python 3 support July 30 ~ August 5 Test to make Python 3 support in good shape. August 6 ~ August 12 Document for Python 3 support. *Project experience* In GSoC2009, I took part in a project: support Scilab language on SWIG [5]. I added a backend module in SWIG, so that it can support all the C features for Scilab language: variables, functions, constants, enums, structs, unions, pointers and arrays. In GSoC2010, I also successfully finished a project called“epfs”[6] , which means embedding Python from Scilab. This project introduces a mechanism to load and use Python code from Scilab. I have about one year’s experience for LLVM. I use it mainly to implement control flow integrity for Operating Systems and thus improve system security. I recently submitted a patch for Target.h file to improve compatibility with SWIG, which has been applied on the trunk. *Biography* Name: Baozeng Ding University: Institute of Software, Chinese Academy of Science Email: sploving1 at gmail.com IRC name: sploving *References* [1]. http://code.google.com/p/llvm-py/ [2]. http://www.python.org/dev/peps/pep-3121/ [3]. http://www.python.org/dev/peps/pep-3123/ [4]. http://www.python.org/dev/peps/pep-0332/ [5]. http://code.google.com/p/google-summer-of-code-2009-swig/downloads/list [6]. http://forge.scilab.org/index.php/p/epfs/ *Appendix* *llvm-py Implementation * Here I give a small example to show the relationship between the Python function in llvm-py and the C function in LLVM. Let us analysis an example in llvm-py: *f_sum = my_module.add_function(ty_func, "sum").* How the above statement is implemented to call LLVM C function successfully? The llvm-py package has six modules, of which the most important is the core module, consisting of the following files: *core.py * high-level support code *_core.c * low-level wrapper code for LLVM Core libraries *wrap.h * It includes header files needed for the low-level wrapper code In *core.py*, there is a class "Module", which has a method "add_function", defined as the following: *def add_function(self, ty, name): """Add a function of given type with given name.""" return Function.new(self, ty, name)* This method calls the constructor of class "*Function*" (Function.new). So let’s take a look at what this constructor is? It is also defined in the file *core.py* in llvm-py as the following: *class Function(GlobalValue): @staticmethod def new(module, func_ty, name): check_is_module(module) check_is_type(func_ty) return _make_value(_core.LLVMAddFunction(module.ptr, name, func_ty.ptr))* The most important statement in the above constructor is: *_core.LLVMAddFunction(module.ptr, name, func_ty.ptr) * If you are familiar with C extensions for Python, you could guess that LLVMAddFunction should be defined in the low-level wrapper file *_core.c*. Let's find out how it is defined in this wrapper file? In *_core.c*, the following statements are what we are looking for. *static PyMethodDef core_methods[] = { ... /* Functions */ _method( LLVMAddFunction ) ... }* LLVMAddFunction is defined as a macro. Let's look at what the macro _method mean? It is defined in _core.c: *#define _method( func ) { # func , _w ## func , METH_VARARGS },* In the above macro, func is the name used in python, and _w ## func is the corresponding name of the wrapper function. ie, When we call a function func in python, it intrinsically calls the wrapper C funtcion _w ## func. So when we use LLVMAddFunction methoed in python, it actually calls _wLLVMAddFunction. Then how is _wLLVMAddFunction defined? Also in *_core.c* file, there is such a statement that is related to LLVMAddFunction: *_wrap_objstrobj2obj(LLVMAddFunction, LLVMModuleRef, LLVMTypeRef, LLVMValueRef) * This macro is defined in wrap.h file: */** * Wrap LLVM functions of the type * outtype func(intype1 arg1, const char *arg2, intype3 arg3) */ #define _wrap_objstrobj2obj(func, intype1, intype3, outtype) \ static PyObject * \ _w ## func (PyObject *self, PyObject *args) \ { \ PyObject *obj1, *obj3; \ intype1 arg1; \ const char *arg2; \ intype3 arg3; \ \ if (!PyArg_ParseTuple(args, "OsO", &obj1, &arg2, &obj3)) \ return NULL; \ \ arg1 = ( intype1 ) PyCObject_AsVoidPtr(obj1); \ arg3 = ( intype3 ) PyCObject_AsVoidPtr(obj3); \ \ return ctor_ ## outtype ( func (arg1, arg2, arg3)); \ }* So the above statement undergoes macro expansion to be: *_wLLVMAddFunction (PyObject *self, PyObject *args) //This is what we are looking for! { PyObject *obj1, *obj3; LLVMModuleRef arg1; const char *arg2; LLVMTypeRef arg3; if (!PyArg_ParseTuple(args, "OsO", &obj1, &arg2, &obj3)) return NULL; arg1 = ( LLVMModuleRef ) PyCObject_AsVoidPtr(obj1); arg3 = ( LLVMTypeRef) PyCObject_AsVoidPtr(obj3); return ctor_LLVMValueRef( LLVMAddFunction (arg1, arg2, arg3)); } * We get the function* _wLLVMAddFunction* that we are looking for. As is show in the last statement of this function: *return ctor_LLVMValueRef( LLVMAddFunction (arg1, arg2, arg3));* we finally get the C function that my_module.add_function in the example calls : *LLVMAddFunction*, which is defined in the file *core.h *of LLVM libries. *LLVMValueRef LLVMAddFunction(LLVMModuleRef M, const char *Name, LLVMTypeRef FunctionTy);* -- Best Regards, Baozeng Ding OSTG,NFS,ISCAS -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120328/edbcc73c/attachment.html>