Hello, I have been thinking about efficient implementation of dynamically typed languages in my spare time. Specifically, I'm working on a toy implementation of a tiny piece of Python using LLVM as a native code generating JIT. I've run into a bit of an issue, involving how Python deals with method calls. I'm not sure how/if I can implement this in LLVM. In Python, the following code: somefunc = a.method somefunc() Roughly translates into: functionObject = lookup( "method" in object a ) functionObject->functionPointer() The challenge is that if "method" is actually a method, calling it magically adds "a" as the first parameter. If it is NOT a method, then no messing with the arguments occurs. As far as can tell, this forces an implementation to create BoundMethod objects that wrap the actual method calls. The question is, how can I implement this efficiently, ideally using LLVM? My idea is to add a NULL pointer as the first parameter to all function calls. "Normal" functions would ignore it, but methods would look at the first parameter to find the "this" pointer. I could then generate a tiny stub for each bound method that would do the following: 1. Replace the first argument with the appropriate "this" 2. Jump to the real function Is it possible to do something like this in LLVM? Will it work if I just create a char array and copy in the appropriate native code for the current platform? I would rather let LLVM do the hard work, but if that isn't possible, I'm looking for some acceptable hack. An additional ugly bit is that these objects will be created and destroyed frequently, so integration with LLVM's memory system is important. The last I checked, LLVM does not keep track of code in memory, so this would effectively create a memory leak. Thanks for any help, Evan Jones
On Fri, 28 Oct 2005, Evan Jones wrote:> I have been thinking about efficient implementation of dynamically typed > languages in my spare time. Specifically, I'm working on a toy implementation > of a tiny piece of Python using LLVM as a native code generating JIT. I'veCool!> run into a bit of an issue, involving how Python deals with method calls. I'm > not sure how/if I can implement this in LLVM. In Python, the following code:Ok.> somefunc = a.method > somefunc() > > Roughly translates into: > > functionObject = lookup( "method" in object a ) > functionObject->functionPointer() > > > The challenge is that if "method" is actually a method, calling it magically > adds "a" as the first parameter. If it is NOT a method, then no messing with > the arguments occurs. As far as can tell, this forces an implementation to > create BoundMethod objects that wrap the actual method calls. The question > is, how can I implement this efficiently, ideally using LLVM?Okay. One simple option would be to insert code like this: if (isamethod(functionObject)) functionObject->functionPointer(a) else functionObject->functionPointer()> My idea is to add a NULL pointer as the first parameter to all function > calls. "Normal" functions would ignore it, but methods would look at the > first parameter to find the "this" pointer. I could then generate a tiny stub > for each bound method that would do the following: > > 1. Replace the first argument with the appropriate "this" > 2. Jump to the real function > > Is it possible to do something like this in LLVM?Sure, you can do this. Another simple option would be to just make every "function" take a first pointer argument which they ignore. This would allow the caller to always pass a this pointer without knowing anything about the callee.> Will it work if I just > create a char array and copy in the appropriate native code for the current > platform?Hrm, sometimes, sometimes not. Code is not always relocatable like that, it sounds dangerous.> I would rather let LLVM do the hard work, but if that isn't possible, > I'm looking for some acceptable hack.LLVM can do it, it's just a matter of picking the right solution. To me, adding a dummy 'this' argument to functions which is ignored seems like the most simple and logical way to do it.> An additional ugly bit is that these objects will be created and destroyed > frequently, so integration with LLVM's memory system is important. The last I > checked, LLVM does not keep track of code in memory, so this would > effectively create a memory leak.If possible, I would suggest avoiding creating and destroying lots of little stubs. Even if we teach llvm to recycle this memory (wouldn't be that hard), it will still be much less efficient than having a dummy argument for functions. Besides, if the 'address is never taken' of these functions, the standard LLVM optimizations will remove dead arguments. -Chris -- http://nondot.org/sabre/ http://llvm.org/
On Oct 29, 2005, at 1:04, Chris Lattner wrote:>> The question is, how can I implement this efficiently, ideally using >> LLVM? > Okay. One simple option would be to insert code like this: > > if (isamethod(functionObject)) > functionObject->functionPointer(a) > else > functionObject->functionPointer()Ah yes, the good old fashioned simple approach. The only change is that by the time I get to the function call, I may no longer have reference to the object (in the compiler), so I would have to stuff that into the bound method object itself.> Sure, you can do this. Another simple option would be to just make > every "function" take a first pointer argument which they ignore. > This would allow the caller to always pass a this pointer without > knowing anything about the callee.Ah, of course! This is probably the best way to do it, since it is so simple. The "FunctionObject" type would contain not only a function pointer, but also a "this" pointer. For normal functions, "this" would be NULL. Why didn't I think of that, since I was halfway to that solution already? That would change the call implementation to the following: functionObject->functionPointer( functionObject->thisPointer )>> Will it work if I just create a char array and copy in the >> appropriate native code for the current platform? > Hrm, sometimes, sometimes not. Code is not always relocatable like > that, it sounds dangerous.Ah, also a good point. A copying garbage collector, for example, would definitely make things more complicated. Thanks for your help! I was definitely thinking the wrong way. Evan Jones -- Evan Jones http://evanjones.ca/
On 10/28/05, Evan Jones <ejones at uwaterloo.ca> wrote: [snip]> Will it work if I just > create a char array and copy in the appropriate native code for the > current platform? I would rather let LLVM do the hard work, but if that > isn't possible, I'm looking for some acceptable hack.(1) The memory page/segment must be marked executable by the OS. Under POSIX systems, this is typically done by mmap()ing an anonymous file and then mprotect()ing the memory. As I remember, POSIX doesn't guarantee that mprotect will work on memory directly allocated with malloc or calloc. I believe some systems allow it, but it's my understanding that this practice is non-portable. The Win32 API has a function similar in name and function to mprotect ("MemProtect"?? "ProtectMem"??), but I'm not a Win32 guy. Note: prior to OSes setting the x86 NX/DX bit, x86 code was able to get away with the assumption that all readable pages are executable. This doesn't make such code correct. (2) As already mentioned by others, you need relocatable code for this to work properly. -Karl