Hi, PHP has a Google Summer of Code project approved to create an LLVM extension for the PHP's VM (Zend). (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F). I'll be mentoring that project (and the student is CC'ed). Although I've already contributed a few patches to clang, I haven't hacked LLVM much, so I would like to gather some advise before misleading the student too much :P So my idea is to use the current PHP parser to produce PHP bytecode and then convert the PHP bytecode to LLVM's bitcode. The extra pass to create PHP bytecode seems necessary for now, as it makes things simpler in the PHP end. The first step would be to convert the PHP bytecode to LLVM by just producing function calls to the PHP interpreter opcode handlers. This has two advantages: it's a simple task and we can put something working fast. The disadvantage is that it would only bypass the opcode dispatcher, leaving no much room for optimizations. In the second phase, we would start to inline some simple PHP bytecodes, like arithmetic operations and so on, by dumping LLVM assembly instead of calling the opcode handler. Eventually we could reach a point that no opcode handlers are necessary. So does this looks like a sane thing? Any helpful advise? Other question: After having the LLVM assembly, how should the binary code be produced, loaded to memory, and then executed? I assume we can link directly to the LLVM code generation and optimization libs. And does it support dumping the code directly to the memory so that we can run it from there without much magic (and then cache it somewhere)? Thanks, Nuno
Hi Nuno, On Apr 22, 2008, at 18:44, Nuno Lopes wrote:> PHP has a Google Summer of Code project approved to create an LLVM > extension for the PHP's VM (Zend). (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F > ). I'll be mentoring that project (and the student is CC'ed). > Although I've already contributed a few patches to clang, I haven't > hacked LLVM much, so I would like to gather some advise before > misleading the student too much :PThis is very exciting!> So my idea is to use the current PHP parser to produce PHP bytecode > and then convert the PHP bytecode to LLVM's bitcode. The extra pass > to create PHP bytecode seems necessary for now, as it makes things > simpler in the PHP end. The first step would be to convert the PHP > bytecode to LLVM by just producing function calls to the PHP > interpreter opcode handlers. This has two advantages: it's a simple > task and we can put something working fast. The disadvantage is that > it would only bypass the opcode dispatcher, leaving no much room for > optimizations.As far as I know, this is exactly how Apple's OpenGL shader JIT works in Mac OS X. Unfortunately, LLVM will rarely make dramatic changes to your memory representation, so this probably won't be as effective as it is in the OpenGL context. (LLVM will only do aggregate->scalar memory reorganizations; it probably won't be able to prove this safe for a dynamic language very often.) Your challenge in generating very- fast code would likely be one of type inference.> In the second phase, we would start to inline some simple PHP > bytecodes, like arithmetic operations and so on, by dumping LLVM > assembly instead of calling the opcode handler. Eventually we could > reach a point that no opcode handlers are necessary. > > So does this looks like a sane thing? Any helpful advise? Other > question: After having the LLVM assembly, how should the binary code > be produced, loaded to memory, and then executed? I assume we can > link directly to the LLVM code generation and optimization libs. And > does it support dumping the code directly to the memory so that we > can run it from there without much magic (and then cache it > somewhere)?You can use the facilities of ExecutionEngine to run code in-memory without ever touching the filesystem. The LLVM tutorial has information on how to do this. http://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html http://llvm.org/docs/tutorial/LangImpl4.html You'll probably want to provide your opcode handlers as an LLVM IR module. Your JIT can start up and “seed” the execution environment with the predefined handlers, then progressively incorporate more functions into the module as execution progresses. Hope that helps, Gordon
Nuno Lopes wrote:> The first step would be to convert the PHP bytecode to LLVM by just > producing function calls to the PHP interpreter opcode handlers.>[...]> In the second phase, we would start to inline some simple PHP bytecodes, > like arithmetic operations and so on, by dumping LLVM assembly instead of > calling the opcode handler. Eventually we could reach a point that no opcode > handlers are necessary.There is some presentation on the LLVM website (by Chris, I guess) mentioning that this can be done almost automatically, by letting LLVM compile the PHP opcode handlers themselves, via the gcc or clang front-end. LLVM can then inline the opcode handlers and apply further optimizations. -- Alain
Thank you both for your answers! That part of type inference was my second question. PHP uses a structure with a union to represent a variable (because a variable can have different types, like a long, a double, a stream, etc..), but often a single variable will only have one type throughout the program (e.g. iterating through $i in a loop). Will LLVM automagically see that we always use the same type for a certain variable and discard the whole union and use a single scalar (and also discard all the type checking done in the opcode handlers)? We can do some type inference on our side if we do a pass on the bytecode, but I would like to be sure if that's needed or if LLVM will do it on its own. Well, about the opcode handlers, that's great news that we don't need to inline them by hand. Now I only need to fix clang to compile PHP :P Thanks, Nuno ----- Original Message ----- From: "Gordon Henriksen" <gordonhenriksen at mac.com> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> Sent: Wednesday, April 23, 2008 12:17 AM Subject: Re: [LLVMdev] PHP Zend LLVM extension (SoC) Hi Nuno, On Apr 22, 2008, at 18:44, Nuno Lopes wrote:> PHP has a Google Summer of Code project approved to create an LLVM > extension for the PHP's VM (Zend). > (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F > ). I'll be mentoring that project (and the student is CC'ed). > Although I've already contributed a few patches to clang, I haven't > hacked LLVM much, so I would like to gather some advise before > misleading the student too much :PThis is very exciting!> So my idea is to use the current PHP parser to produce PHP bytecode > and then convert the PHP bytecode to LLVM's bitcode. The extra pass > to create PHP bytecode seems necessary for now, as it makes things > simpler in the PHP end. The first step would be to convert the PHP > bytecode to LLVM by just producing function calls to the PHP > interpreter opcode handlers. This has two advantages: it's a simple > task and we can put something working fast. The disadvantage is that > it would only bypass the opcode dispatcher, leaving no much room for > optimizations.As far as I know, this is exactly how Apple's OpenGL shader JIT works in Mac OS X. Unfortunately, LLVM will rarely make dramatic changes to your memory representation, so this probably won't be as effective as it is in the OpenGL context. (LLVM will only do aggregate->scalar memory reorganizations; it probably won't be able to prove this safe for a dynamic language very often.) Your challenge in generating very- fast code would likely be one of type inference.> In the second phase, we would start to inline some simple PHP > bytecodes, like arithmetic operations and so on, by dumping LLVM > assembly instead of calling the opcode handler. Eventually we could > reach a point that no opcode handlers are necessary. > > So does this looks like a sane thing? Any helpful advise? Other > question: After having the LLVM assembly, how should the binary code > be produced, loaded to memory, and then executed? I assume we can > link directly to the LLVM code generation and optimization libs. And > does it support dumping the code directly to the memory so that we > can run it from there without much magic (and then cache it > somewhere)?You can use the facilities of ExecutionEngine to run code in-memory without ever touching the filesystem. The LLVM tutorial has information on how to do this. http://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html http://llvm.org/docs/tutorial/LangImpl4.html You'll probably want to provide your opcode handlers as an LLVM IR module. Your JIT can start up and “seed” the execution environment with the predefined handlers, then progressively incorporate more functions into the module as execution progresses. Hope that helps, Gordon
Hi Nuno, this can be a great project. Some PHP opcodes can be optimised a lot by llvm (like branches or function calls) while others like operations on variables can't be so easy optimized due to the dynamic nature of PHP. For the latest maybe you can use some automatic type inference, like the ones used in languages like Haskell, but this is is a big project and there are also mixed cases like adding a number to a string. I think for these you can use for now the PHP handlers. Even so, I feel that the speed gain will be considerable. Another thing you can do with only a little more work is to create an abstraction layer between the webserver module and the content source, abstraction layer which will work only with LLVM compiled files (.bc). In that scenario you can compile PHP files to LLVM .bc file format. These files can also be used as a cache, thus eliminating future parsing and compiling times. The speed gain can be very high, because for very much accessed sites some pages are needed hundreds of times per minute. The generated .bc files will call where needed the handlers from the PHP runtime and libraries. On long term this abstraction layer, which in fact is a webserver module, can be used with many frontends which will generate .bc code from different source languages (now Ruby, Python, Lua, etc comes into my mind), transforming all the thing into a framework similar with the ones based on .class or .NET cli formats. This of course can be done if the .bc format is mature and stable, else it can only be used as a cache. Good luck, Razvan> Hi, > > PHP has a Google Summer of Code project approved to create an LLVM > extension > for the PHP's VM (Zend). > (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F). > I'll be mentoring that project (and the student is CC'ed). > Although I've already contributed a few patches to clang, I haven't hacked > LLVM much, so I would like to gather some advise before misleading the > student too much :P > > So my idea is to use the current PHP parser to produce PHP bytecode and > then > convert the PHP bytecode to LLVM's bitcode. The extra pass to create PHP > bytecode seems necessary for now, as it makes things simpler in the PHP > end. > The first step would be to convert the PHP bytecode to LLVM by just > producing function calls to the PHP interpreter opcode handlers. This has > two advantages: it's a simple task and we can put something working fast. > The disadvantage is that it would only bypass the opcode dispatcher, > leaving > no much room for optimizations. > In the second phase, we would start to inline some simple PHP bytecodes, > like arithmetic operations and so on, by dumping LLVM assembly instead of > calling the opcode handler. Eventually we could reach a point that no > opcode > handlers are necessary. > > So does this looks like a sane thing? Any helpful advise? > Other question: After having the LLVM assembly, how should the binary code > be produced, loaded to memory, and then executed? I assume we can link > directly to the LLVM code generation and optimization libs. And does it > support dumping the code directly to the memory so that we can run it from > there without much magic (and then cache it somewhere)? > > > Thanks, > Nuno > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Joachim Durchholz
2008-Apr-24 16:22 UTC
[LLVMdev] Recommendation: aim for FastCGI, not webserver modules (slightly OT)
Am Donnerstag, den 24.04.2008, 11:08 +0300 schrieb Razvan Aciu:> On long term this abstraction layer, which in fact is a webserver module,Writing a webserver module is probably not the first thing one should do. All webserver modules have serious trouble with security in a multiuser environment (not a surprise: the module runs as the Apache user, so the scripts of the multi users could interfere with each other). If you target the mass hosting / mass scripting market, start with a FastCGI application server; these are standalone processes that can be run with the proper owner set and hence don't have these problems. Besides, you can use the same FastCGI server for all web serves, while you'd need to write separate interface stuff for Apache, Lightpd, Zope, or whatever you'd want to target. Regards, Jo