Thank you both for your answers!
That part of type inference was my second question. PHP uses a structure
with a union to represent a variable (because a variable can have different
types, like a long, a double, a stream, etc..), but often a single variable
will only have one type throughout the program (e.g. iterating through $i in
a loop). Will LLVM automagically see that we always use the same type for a
certain variable and discard the whole union and use a single scalar (and
also discard all the type checking done in the opcode handlers)? We can do
some type inference on our side if we do a pass on the bytecode, but I would
like to be sure if that's needed or if LLVM will do it on its own.
Well, about the opcode handlers, that's great news that we don't need to
inline them by hand. Now I only need to fix clang to compile PHP :P
Thanks,
Nuno
----- Original Message -----
From: "Gordon Henriksen" <gordonhenriksen at mac.com>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Wednesday, April 23, 2008 12:17 AM
Subject: Re: [LLVMdev] PHP Zend LLVM extension (SoC)
Hi Nuno,
On Apr 22, 2008, at 18:44, Nuno Lopes wrote:
> PHP has a Google Summer of Code project approved to create an LLVM
> extension for the PHP's VM (Zend).
> (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F
> ). I'll be mentoring that project (and the student is CC'ed).
> Although I've already contributed a few patches to clang, I haven't
> hacked LLVM much, so I would like to gather some advise before
> misleading the student too much :P
This is very exciting!
> So my idea is to use the current PHP parser to produce PHP bytecode
> and then convert the PHP bytecode to LLVM's bitcode. The extra pass
> to create PHP bytecode seems necessary for now, as it makes things
> simpler in the PHP end. The first step would be to convert the PHP
> bytecode to LLVM by just producing function calls to the PHP
> interpreter opcode handlers. This has two advantages: it's a simple
> task and we can put something working fast. The disadvantage is that
> it would only bypass the opcode dispatcher, leaving no much room for
> optimizations.
As far as I know, this is exactly how Apple's OpenGL shader JIT works
in Mac OS X. Unfortunately, LLVM will rarely make dramatic changes to
your memory representation, so this probably won't be as effective as
it is in the OpenGL context. (LLVM will only do aggregate->scalar
memory reorganizations; it probably won't be able to prove this safe
for a dynamic language very often.) Your challenge in generating very-
fast code would likely be one of type inference.
> In the second phase, we would start to inline some simple PHP
> bytecodes, like arithmetic operations and so on, by dumping LLVM
> assembly instead of calling the opcode handler. Eventually we could
> reach a point that no opcode handlers are necessary.
>
> So does this looks like a sane thing? Any helpful advise? Other
> question: After having the LLVM assembly, how should the binary code
> be produced, loaded to memory, and then executed? I assume we can
> link directly to the LLVM code generation and optimization libs. And
> does it support dumping the code directly to the memory so that we
> can run it from there without much magic (and then cache it
> somewhere)?
You can use the facilities of ExecutionEngine to run code in-memory
without ever touching the filesystem. The LLVM tutorial has
information on how to do this.
http://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html
http://llvm.org/docs/tutorial/LangImpl4.html
You'll probably want to provide your opcode handlers as an LLVM IR
module. Your JIT can start up and “seed” the execution environment
with the predefined handlers, then progressively incorporate more
functions into the module as execution progresses.
Hope that helps,
Gordon