thr3ads.net - llvm dev - [LLVMdev] PHP Zend LLVM extension (SoC) [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Nuno Lopes

2008-Apr-22 22:44 UTC

[LLVMdev] PHP Zend LLVM extension (SoC)

Hi,

PHP has a Google Summer of Code project approved to create an LLVM extension
for the PHP's VM (Zend).
(http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F).
I'll be mentoring that project (and the student is CC'ed).
Although I've already contributed a few patches to clang, I haven't
hacked
LLVM much, so I would like to gather some advise before misleading the
student too much :P

So my idea is to use the current PHP parser to produce PHP bytecode and then
convert the PHP bytecode to LLVM's bitcode. The extra pass to create PHP
bytecode seems necessary for now, as it makes things simpler in the PHP end.
The first step would be to convert the PHP bytecode to LLVM by just
producing function calls to the PHP interpreter opcode handlers. This has
two advantages: it's a simple task and we can put something working fast.
The disadvantage is that it would only bypass the opcode dispatcher, leaving
no much room for optimizations.
In the second phase, we would start to inline some simple PHP bytecodes,
like arithmetic operations and so on, by dumping LLVM assembly instead of
calling the opcode handler. Eventually we could reach a point that no opcode
handlers are necessary.

So does this looks like a sane thing? Any helpful advise?
Other question: After having the LLVM assembly, how should the binary code
be produced, loaded to memory, and then executed? I assume we can link
directly to the LLVM code generation and optimization libs. And does it
support dumping the code directly to the memory so that we can run it from
there without much magic (and then cache it somewhere)?


Thanks,
Nuno

Gordon Henriksen

2008-Apr-22 23:17 UTC

head link

[LLVMdev] PHP Zend LLVM extension (SoC)

Hi Nuno,

On Apr 22, 2008, at 18:44, Nuno Lopes wrote:
> PHP has a Google Summer of Code project approved to create an LLVM  
> extension for the PHP's VM (Zend).
(http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F
> ). I'll be mentoring that project (and the student is CC'ed).  
> Although I've already contributed a few patches to clang, I haven't
> hacked LLVM much, so I would like to gather some advise before  
> misleading the student too much :P
This is very exciting!
> So my idea is to use the current PHP parser to produce PHP bytecode  
> and then convert the PHP bytecode to LLVM's bitcode. The extra pass  
> to create PHP bytecode seems necessary for now, as it makes things  
> simpler in the PHP end. The first step would be to convert the PHP  
> bytecode to LLVM by just producing function calls to the PHP  
> interpreter opcode handlers. This has two advantages: it's a simple  
> task and we can put something working fast. The disadvantage is that  
> it would only bypass the opcode dispatcher, leaving no much room for  
> optimizations.
As far as I know, this is exactly how Apple's OpenGL shader JIT works  
in Mac OS X. Unfortunately, LLVM will rarely make dramatic changes to  
your memory representation, so this probably won't be as effective as  
it is in the OpenGL context. (LLVM will only do aggregate->scalar  
memory reorganizations; it probably won't be able to prove this safe  
for a dynamic language very often.) Your challenge in generating very- 
fast code would likely be one of type inference.
> In the second phase, we would start to inline some simple PHP  
> bytecodes, like arithmetic operations and so on, by dumping LLVM  
> assembly instead of calling the opcode handler. Eventually we could  
> reach a point that no opcode handlers are necessary.
>
> So does this looks like a sane thing? Any helpful advise? Other  
> question: After having the LLVM assembly, how should the binary code  
> be produced, loaded to memory, and then executed? I assume we can  
> link directly to the LLVM code generation and optimization libs. And  
> does it support dumping the code directly to the memory so that we  
> can run it from there without much magic (and then cache it  
> somewhere)?
You can use the facilities of ExecutionEngine to run code in-memory  
without ever touching the filesystem. The LLVM tutorial has  
information on how to do this.

http://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html
http://llvm.org/docs/tutorial/LangImpl4.html

You'll probably want to provide your opcode handlers as an LLVM IR  
module. Your JIT can start up and “seed” the execution environment  
with the predefined handlers, then progressively incorporate more  
functions into the module as execution progresses.

Hope that helps,
Gordon

Alain Frisch

2008-Apr-23 06:00 UTC

head link

[LLVMdev] PHP Zend LLVM extension (SoC)

Nuno Lopes wrote:> The first step would be to convert the PHP bytecode to LLVM by just
> producing function calls to the PHP interpreter opcode handlers.
 >[...]> In the second phase, we would start to inline some simple PHP bytecodes,
> like arithmetic operations and so on, by dumping LLVM assembly instead of
> calling the opcode handler. Eventually we could reach a point that no
opcode
> handlers are necessary.
There is some presentation on the LLVM website (by Chris, I guess) 
mentioning that this can be done almost automatically, by letting LLVM 
compile the PHP opcode handlers themselves, via the gcc or clang 
front-end. LLVM can then inline the opcode handlers and apply further 
optimizations.

-- Alain

Nuno Lopes

2008-Apr-23 18:44 UTC

head link

[LLVMdev] PHP Zend LLVM extension (SoC)

Thank you both for your answers!
That part of type inference was my second question. PHP uses a structure 
with a union to represent a variable (because a variable can have different 
types, like a long, a double, a stream, etc..), but often a single variable 
will only have one type throughout the program (e.g. iterating through $i in 
a loop). Will LLVM automagically see that we always use the same type for a 
certain variable and discard the whole union and use a single scalar (and 
also discard all the type checking done in the opcode handlers)? We can do 
some type inference on our side if we do a pass on the bytecode, but I would 
like to be sure if that's needed or if LLVM will do it on its own.

Well, about the opcode handlers, that's great news that we don't need to
inline them by hand. Now I only need to fix clang to compile PHP :P

Thanks,
Nuno


----- Original Message ----- 
From: "Gordon Henriksen" <gordonhenriksen at mac.com>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Wednesday, April 23, 2008 12:17 AM
Subject: Re: [LLVMdev] PHP Zend LLVM extension (SoC)


Hi Nuno,

On Apr 22, 2008, at 18:44, Nuno Lopes wrote:
> PHP has a Google Summer of Code project approved to create an LLVM
> extension for the PHP's VM (Zend). 
> (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F
> ). I'll be mentoring that project (and the student is CC'ed).
> Although I've already contributed a few patches to clang, I haven't
> hacked LLVM much, so I would like to gather some advise before
> misleading the student too much :P
This is very exciting!
> So my idea is to use the current PHP parser to produce PHP bytecode
> and then convert the PHP bytecode to LLVM's bitcode. The extra pass
> to create PHP bytecode seems necessary for now, as it makes things
> simpler in the PHP end. The first step would be to convert the PHP
> bytecode to LLVM by just producing function calls to the PHP
> interpreter opcode handlers. This has two advantages: it's a simple
> task and we can put something working fast. The disadvantage is that
> it would only bypass the opcode dispatcher, leaving no much room for
> optimizations.
As far as I know, this is exactly how Apple's OpenGL shader JIT works
in Mac OS X. Unfortunately, LLVM will rarely make dramatic changes to
your memory representation, so this probably won't be as effective as
it is in the OpenGL context. (LLVM will only do aggregate->scalar
memory reorganizations; it probably won't be able to prove this safe
for a dynamic language very often.) Your challenge in generating very-
fast code would likely be one of type inference.
> In the second phase, we would start to inline some simple PHP
> bytecodes, like arithmetic operations and so on, by dumping LLVM
> assembly instead of calling the opcode handler. Eventually we could
> reach a point that no opcode handlers are necessary.
>
> So does this looks like a sane thing? Any helpful advise? Other
> question: After having the LLVM assembly, how should the binary code
> be produced, loaded to memory, and then executed? I assume we can
> link directly to the LLVM code generation and optimization libs. And
> does it support dumping the code directly to the memory so that we
> can run it from there without much magic (and then cache it
> somewhere)?
You can use the facilities of ExecutionEngine to run code in-memory
without ever touching the filesystem. The LLVM tutorial has
information on how to do this.

http://llvm.org/doxygen/classllvm_1_1ExecutionEngine.html
http://llvm.org/docs/tutorial/LangImpl4.html

You'll probably want to provide your opcode handlers as an LLVM IR
module. Your JIT can start up and “seed” the execution environment
with the predefined handlers, then progressively incorporate more
functions into the module as execution progresses.

Hope that helps,
Gordon

Razvan Aciu

2008-Apr-24 08:08 UTC

head link

[LLVMdev] PHP Zend LLVM extension (SoC)

Hi Nuno,

this can be a great project. Some PHP opcodes can be optimised a lot by llvm 
(like branches or function calls) while others like operations on variables 
can't be so easy optimized due to the dynamic nature of PHP. For the latest 
maybe you can use some automatic type inference, like the ones used in 
languages like Haskell, but this is is a big project and there are also 
mixed cases like adding a number to a string. I think for these you can use 
for now the PHP handlers. Even so, I feel that the speed gain will be 
considerable.
Another thing you can do with only a little more work is to create an 
abstraction layer between the webserver module and the content source, 
abstraction layer which will work only with LLVM compiled files (.bc). In 
that scenario you can compile PHP files to LLVM .bc file format. These files 
can also be used as a cache, thus eliminating future parsing and compiling 
times. The speed gain can be very high, because for very much accessed sites 
some pages are needed hundreds of times per minute. The generated .bc files 
will call where needed the handlers from the PHP runtime and libraries.
On long term this abstraction layer, which in fact is a webserver module, 
can be used with many frontends which will generate  .bc code from different 
source languages (now Ruby, Python, Lua, etc comes into my mind), 
transforming all the thing into a framework similar with the ones based on 
.class or .NET cli formats. This of course can be done if the .bc format is 
mature and stable, else it can only be used as a cache.

Good luck,
Razvan
> Hi,
>
> PHP has a Google Summer of Code project approved to create an LLVM 
> extension
> for the PHP's VM (Zend).
> (http://code.google.com/soc/2008/php/appinfo.html?csaid=73D5F5E282F9163F).
> I'll be mentoring that project (and the student is CC'ed).
> Although I've already contributed a few patches to clang, I haven't
hacked
> LLVM much, so I would like to gather some advise before misleading the
> student too much :P
>
> So my idea is to use the current PHP parser to produce PHP bytecode and 
> then
> convert the PHP bytecode to LLVM's bitcode. The extra pass to create
PHP
> bytecode seems necessary for now, as it makes things simpler in the PHP 
> end.
> The first step would be to convert the PHP bytecode to LLVM by just
> producing function calls to the PHP interpreter opcode handlers. This has
> two advantages: it's a simple task and we can put something working
fast.
> The disadvantage is that it would only bypass the opcode dispatcher, 
> leaving
> no much room for optimizations.
> In the second phase, we would start to inline some simple PHP bytecodes,
> like arithmetic operations and so on, by dumping LLVM assembly instead of
> calling the opcode handler. Eventually we could reach a point that no 
> opcode
> handlers are necessary.
>
> So does this looks like a sane thing? Any helpful advise?
> Other question: After having the LLVM assembly, how should the binary code
> be produced, loaded to memory, and then executed? I assume we can link
> directly to the LLVM code generation and optimization libs. And does it
> support dumping the code directly to the memory so that we can run it from
> there without much magic (and then cache it somewhere)?
>
>
> Thanks,
> Nuno
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Joachim Durchholz

2008-Apr-24 16:22 UTC

head link

[LLVMdev] Recommendation: aim for FastCGI, not webserver modules (slightly OT)

Am Donnerstag, den 24.04.2008, 11:08 +0300 schrieb Razvan
Aciu:> On long term this abstraction layer, which in fact is a webserver module,
Writing a webserver module is probably not the first thing one should
do. All webserver modules have serious trouble with security in a
multiuser environment (not a surprise: the module runs as the Apache
user, so the scripts of the multi users could interfere with each
other).
If you target the mass hosting / mass scripting market, start with a
FastCGI application server; these are standalone processes that can be
run with the proper owner set and hence don't have these problems.
Besides, you can use the same FastCGI server for all web serves, while
you'd need to write separate interface stuff for Apache, Lightpd, Zope,
or whatever you'd want to target.

Regards,
Jo

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Apr 2008 - [LLVMdev] PHP Zend LLVM extension (SoC)

[LLVMdev] PHP Zend LLVM extension (SoC)

[LLVMdev] PHP Zend LLVM extension (SoC)

[LLVMdev] PHP Zend LLVM extension (SoC)

[LLVMdev] PHP Zend LLVM extension (SoC)

[LLVMdev] PHP Zend LLVM extension (SoC)

[LLVMdev] Recommendation: aim for FastCGI, not webserver modules (slightly OT)

Possibly Parallel Threads