Hi everybody, I'm currently looking at LLVM as a possible back-end to a dynamic programming system (in the tradition of Smalltalk) we are developing. I have read most of the llvmdev archives, and I'm aware that some things are 'planned' but not implemented yet. We are willing to contribute the code we'll need for our project, but before I can start coding I'll have to submit to my boss a very concrete proposal on which changes I'll make and how long they're going to take. So before I can present a concrete proposal, I have some doubts on the design of LLVM and on how some particular constructs should be mapped onto its bytecode representation. Please understand that these questions are not intended to criticize LLVM, but instead to better my understanding of it. 1. Opcodes and intrinsics Which are the differences between opcodes and intrinsics? How is it determined, for an operation, to implement it as an opcode or as an intrinsic function? As I understand it, compilation passes can both lower intrinsics into opcodes and also replace opcode sequences, so in the end some of them are interchangeable. For example, why is there a store opcode and a llvm_gcwrite intrinsic? Couldn't the front-end just produce stores/volatile stores and then a compilation pass transform them into a write-barrier if necessary? A possible view of intrinsics could be "operations that don't depend on the target architecture, but instead on the language runtime". But then wouldn't malloc/free be intrinsics? 2. Stack and registers As the LLVM instruction set has a potentially infinite number of registers which are mapped onto target registers or the stack by the register allocator, why is there a separate stack? I would understand it, if the stack was more accessible, as a way to implement closures, but it has been repeated here that the correct way to do that is to use heap-allocated structures, as functions can't access other functions' stacks. Is it to signal locations that need to be changed in-place? 3. Control transfer Why are the control transfer operations so high level when compared to actual processors? Usually processors have instructions to jump to a concrete location and everything else is managed by the compiler (saving into the stack, getting result parameters, etc.) depending on the language's calling conventions. In LLVM there's just one way to transfer control, and the only proposal I've seen (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps this high level. What are the difficulties in having low level transfer control operations, with explicitly managed arguments, saving registers, etc? Well, that's all for now. Thanks in advance, Marc Ordinas i Llopis | Tragnarion Studios
On Fri, Oct 22, 2004 at 03:18:00PM +0200, Marc Ordinas i Llopis wrote:> I'm currently looking at LLVM as a possible back-end to a dynamic > programming system (in the tradition of Smalltalk) we are developing.Neat!> I have read most of the llvmdev archives, and I'm aware that some > things are 'planned' but not implemented yet. We are willing to > contribute the code we'll need for our project, but before I can start > coding I'll have to submit to my boss a very concrete proposal on > which changes I'll make and how long they're going to take.Sounds good!> 1. Opcodes and intrinsics > > Which are the differences between opcodes and intrinsics? How is it > determined, for an operation, to implement it as an opcode or as an > intrinsic function?Opcodes are such things that are usually implemented in one or just a few machine instructions, i.e. things that have a direct correlation to a machine ISA, such as 'add', 'mul', 'div', etc. Intrinsics are such things that provide a more complex interaction with the processor (or the language runtime), but the implementation would be so different from one platform to the next, that it will be more than just a "few instructions". An example of such would be the llvm.gc* intrinsics that you mention, which are for garbage collected languages. Other intrinsics include debugging support, etc.> As I understand it, compilation passes can both lower intrinsics into > opcodes and also replace opcode sequences, so in the end some of them > are interchangeable.That's not really correct. The intrinsics such as llvm.frameaddress and llvm.returnaddress have no equivalents in LLVM opcodes -- the meaning of the intrinsics is specifically machine-dependent, and LLVM (and its opcodes) are machine-independent, so there is no valid interchange of these intrinsics with any combination of LLVM opcodes. The llvm.memcpy intrinsic can be lowered onto an explicit LLVM loop, but the memcpy intrinsic provides a succinct representation of memory copy and so stays as such. Plus, it is harder to do the reverse analysis -- prove that a loop is essentially a memcpy().> For example, why is there a store opcode and a llvm_gcwrite intrinsic?The 'store' opcode is for generic writes, the gcwrite is specifically for garbage collection. I think someone else would be able to give you more information about the GC intrinsics than I am, however.> Couldn't the front-end just produce stores/volatile stores and then a > compilation pass transform them into a write-barrier if necessary?Again, see above: gcwrite is not a "generic store" in the way that the 'store' opcode is, nor is it a barrier.> A possible view of intrinsics could be "operations that don't depend > on the target architecture, but instead on the language runtime". But > then wouldn't malloc/free be intrinsics?Good question. Due to the amount of pointer/data analysis in LLVM, it is often necessary to consider memory allocation instructions to see where memory is allocated and deallocated. As such, the malloc/free instructions provide that higher-level construct that make it easier to manipulate memory-operating instructions since that is done so often. Consider: for an llvm.malloc intrinsic, one would have to say something like this every time we wanted to analyze memory usage: if (CallInst *CI = dyn_cast<CallInst>(I)) if (CI->getCalledFunction()->getName() == "malloc") { // ... } whereas with the malloc instruction raising/lowering pass, you would say if (MallocInst *MI = dyn_cast<MallocInst>(I)) { // ... } Given the prevalence of such code, this makes it more efficient and cleaner to write the second way. Note that LLVM will work just fine if you use direct calls to malloc() and never create malloc instructions.> 2. Stack and registers > > As the LLVM instruction set has a potentially infinite number of > registers which are mapped onto target registers or the stack by the > register allocator, why is there a separate stack?The stack is used for two things: storing structures allocated with alloca() or automatic variables declared to be function-local. The spilling of the registers is done automatically (transparently), but the allocation of DATA is explicit in LLVM to analyze the pointer aliases.> I would understand it, if the stack was more accessible, as a way to > implement closures, but it has been repeated here that the correct way > to do that is to use heap-allocated structures, as functions can't > access other functions' stacks. Is it to signal locations that need to > be changed in-place?I'm sorry, I don't quite understand this question.> 3. Control transfer > > Why are the control transfer operations so high level when compared to > actual processors?Because LLVM is target-independent. It is the job of the backend code generator (see llvm/lib/Target/*) to convert high-level control transfer "call" instruction (see below) to the stack-pop/push or register-passing method of the target. These details are largely different from one machine to the next, and have no relevance to LLVM analyses.> Usually processors have instructions to jump to a concrete location > and everything else is managed by the compiler (saving into the stack, > getting result parameters, etc.) depending on the language's calling > conventions.LLVM has the 'call' instruction that abstracts the target machine's calling convention. See http://llvm.cs.uiuc.edu/docs/LangRef.html#i_call for more information.> In LLVM there's just one way to transfer control, and > the only proposal I've seen > (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps > this high level.This is to have more options for calling conventions that may be specified for a target that are non-standard. Mostly these are optimizations (i.e., some registers do not need to be saved if calling a particular library routine, etc.).> What are the difficulties in having low level transfer control > operations, with explicitly managed arguments, saving registers, etc?Choosing any lower-level mechanism will make LLVM no longer be target-independent. Choosing to have the union of all machines is nearly impossible and is overly complicated and unnecessary for any LLVM-to-LLVM analyses and transformations. What is that you are looking to express that isn't captured by the `call' instruction? Hope that helps, -- Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu
On Fri, 2004-10-22 at 06:18, Marc Ordinas i Llopis wrote:> Hi everybody, >Hi Marc> I'm currently looking at LLVM as a possible back-end to a dynamic > programming system (in the tradition of Smalltalk) we are developing.Great!> I > have read most of the llvmdev archives, and I'm aware that some things > are 'planned' but not implemented yet. We are willing to contribute the > code we'll need for our project, but before I can start coding I'll have > to submit to my boss a very concrete proposal on which changes I'll make > and how long they're going to take. > > So before I can present a concrete proposal, I have some doubts on the > design of LLVM and on how some particular constructs should be mapped > onto its bytecode representation. Please understand that these questions > are not intended to criticize LLVM, but instead to better my > understanding of it.Okay, I'll take a crack at it. Others will probably want to give you better answers than mine :)> > 1. Opcodes and intrinsics > > Which are the differences between opcodes and intrinsics? How is it > determined, for an operation, to implement it as an opcode or as an > intrinsic function?The opcodes are generally fixed as they represent the LLVM mid-level IR. Changing the opcode set can have wide-reaching impact on all the analysis, transform, and codegen passes in LLVM so its not a change taken lightly. However, when it makes sense, they are added occasionally. For example, we recently added the "unreachable" instruction which allows a front end compiler to identify code locations that should not be reached. This can help some of the passes. As for intrinsics, these are basically function calls that LLVM knows about. For example, things like memset and memcpy could be implemented as a function but could also be implemented with direct code if the processor supports it. Intrinsics are place holders for either code generation or invocation of a runtime library function. This is probably where you'd want to extend.> As I understand it, compilation passes can both lower intrinsics into > opcodes and also replace opcode sequences, so in the end some of them > are interchangeable. For example, why is there a store opcode and a > llvm_gcwrite intrinsic? Couldn't the front-end just produce > stores/volatile stores and then a compilation pass transform them into a > write-barrier if necessary?I believe the llvm_gcwrite intrinsic is for garbage collection, which is entirely optional. If all stores were to have write-barrier semantics there could be significant performance penalties.> A possible view of intrinsics could be "operations that don't depend on > the target architecture, but instead on the language runtime". But then > wouldn't malloc/free be intrinsics?Intrinsics are intended to be replaceable by the target's code generation, if possible/necessary, or emulated with a function call if not. Language runtimes should be just that .. calls to functions located in libraries.> > 2. Stack and registers > > As the LLVM instruction set has a potentially infinite number of > registers which are mapped onto target registers or the stack by the > register allocator, why is there a separate stack? I would understand > it, if the stack was more accessible, as a way to implement closures, > but it has been repeated here that the correct way to do that is to use > heap-allocated structures, as functions can't access other functions' > stacks. Is it to signal locations that need to be changed in-place?I'm not sure what you mean by a "separate stack". Register allocation can spill registers to THE stack. I'll let someone more knowledgeable about this answer.> 3. Control transfer > > Why are the control transfer operations so high level when compared to > actual processors? Usually processors have instructions to jump to a > concrete location and everything else is managed by the compiler (saving > into the stack, getting result parameters, etc.) depending on the > language's calling conventions. In LLVM there's just one way to transfer > control, and the only proposal I've seen > (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps > this high level. What are the difficulties in having low level transfer > control operations, with explicitly managed arguments, saving registers, > etc?I don't think there are any difficulties in having low-level control transfer operations, I think its more that we don't want them. The point behind the mid-level SSA IR is to make it simple to generate code from a compiler front end and to not restrict choices that could be made by analysis, transform, and codegen passes. The fact that we have a relatively high level for control transfer operations goes right along with the rest of the LLVM IR (e.g. gep, call, typed operators, malloc, free). The point is to make a simple, consistent, small IR with the following goals: * front ends can easily generate code for it because the number of instructions is small and model is simple. * analyses and transforms are simplified because they don't have to reason over mind-numbing complexity of lower-level instruction sets * code generation has significant freedom in the code it generates> > Well, that's all for now. Thanks in advance, > > Marc Ordinas i Llopis | Tragnarion StudiosThanks for the interesting questions! Reid. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20041023/fc549a32/attachment.sig>
On Fri, 22 Oct 2004, Marc Ordinas i Llopis wrote:> Hi everybody,Hi!> I'm currently looking at LLVM as a possible back-end to a dynamic > programming system (in the tradition of Smalltalk) we are developing. IVery cool!> have read most of the llvmdev archives, and I'm aware that some things > are 'planned' but not implemented yet. We are willing to contribute the > code we'll need for our project, but before I can start coding I'll have > to submit to my boss a very concrete proposal on which changes I'll make > and how long they're going to take.Ok.> So before I can present a concrete proposal, I have some doubts on the > design of LLVM and on how some particular constructs should be mapped > onto its bytecode representation. Please understand that these questions > are not intended to criticize LLVM, but instead to better my > understanding of it.Sure. Before we begin, let me point out that IR design is really a black art of balancing various forces. Adding stuff to the IR makes it more powerful but also more complex to deal with and process (making it more likely that bugs occur). However, we need to add things when there are features or capabilities that cannot be represented in LLVM. Finally, LLVM is not immutable: though it's quite stable now, we are always interested in feedback and ideas. :)> 1. Opcodes and intrinsics > > Which are the differences between opcodes and intrinsics? How is it > determined, for an operation, to implement it as an opcode or as an > intrinsic function?This is a hard thing to really decide, as there frankly isn't much difference. Adding intrinsics is far easier than adding instructions, but intrinsics cannot do some things (for example, they cannot be terminator instructions). I tend to prefer to make only very simple and orthogonal operations be the instructions, relegating the rest to be intrinsics. In practice, the biggest difference might be in the bytecode encoding: instructions are encoded more densely than intrinsics.> As I understand it, compilation passes can both lower intrinsics into > opcodes and also replace opcode sequences, so in the end some of them > are interchangeable. For example, why is there a store opcode and a > llvm_gcwrite intrinsic?As pointed out by others, these are completely different operations. In particular, the gcwrite intrinsic is used by the front-end to indicate that a write-barrier may be needed. Depending on the implementation of the garbage collector, this may expand to code or just a normal store instruction.> Couldn't the front-end just produce stores/volatile stores and then a > compilation pass transform them into a write-barrier if necessary?Sortof. The problem with this is that (without gcwrite) there is no way to identify the stores that should be turned into write barriers. In particular, the heap may have multiple parts to it, some of which are GC'd and some are not. For example, the .NET framework has different pointer types for managed and unmanaged pointers: without the ability to represent both in LLVM, we could not support it.> A possible view of intrinsics could be "operations that don't depend on > the target architecture, but instead on the language runtime".There are a least these catagories of intrinsics: 1. Target specific intrinsics: these are things like llvm.returnaddress and the varargs stuff that are specific to code generators: there is simply no way to express this in LLVM without an intrinsic. These could be made into LLVM instructions, but would provide no real advantage if we did so. 2. GC intrinsics: These are their own catagory because they require support for the LLVM code generator and GC runtime library to interpret. 3. Extended langauge intrinsics. These include llvm.memcpy and friends. Note that the intrinsics are different (more powerful) than the libc equivalent, because they allow alignment info to be expressed.> But then wouldn't malloc/free be intrinsics?Heh, good question. This is largely historical, but you're right, they probably should have been made intrinsics. Perhaps in the future they will be converted to be intrinsics, but for now they remain instructions.> 2. Stack and registers > > As the LLVM instruction set has a potentially infinite number of > registers which are mapped onto target registers or the stack by the > register allocator,Yes.> why is there a separate stack? I would understand it, if the stack was > more accessible, as a way to implement closures, but it has been > repeated here that the correct way to do that is to use heap-allocated > structures, as functions can't access other functions' stacks. Is it to > signal locations that need to be changed in-place?I'm not sure what you mean. In particular, the alloca instruction is used to explicitly allocate stack space. Because it is not possible to take the address of LLVM registers, this the mechanism that we use to allocate stack space. Certain langauges do not need a stack, and thus do not need to use alloca, other languages (e.g. C) do. If you clarify your question I'll be able to give a more satisfactory answer.> 3. Control transfer > > Why are the control transfer operations so high level when compared to > actual processors? Usually processors have instructions to jump to a > concrete location and everything else is managed by the compiler (saving > into the stack, getting result parameters, etc.) depending on the > language's calling conventions.The idea is to make the producer of the LLVM code as independent from the target as possible. In particular, a front-end for a type safe language should be able to produce a type-safe module that works on all targets.> In LLVM there's just one way to transfer control, and the only proposal > I've seen > (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps > this high level. What are the difficulties in having low level transfer > control operations, with explicitly managed arguments, saving registers, > etc?I'm not sure what it is that you are trying to do. The abstraction provided by the call instruction is good for the optimizers and analyses (because they see exactly what they need, not extra details), good for compilation speed, and good for target independence. Custom calling conventions (link above) are specifically designed to give the code generator the flexibility to pick the most efficient calling convention for a particular call/return pair. Given all of the above (efficient compiled code, easy to write analysis/xforms, happy front-ends, fast compile times), I'm not sure what your approach would give. Care to elaborate? -Chris -- http://llvm.org/ http://nondot.org/sabre/
On Sat, 23 Oct 2004, Misha Brukman wrote:> > A possible view of intrinsics could be "operations that don't depend > > on the target architecture, but instead on the language runtime". But > > then wouldn't malloc/free be intrinsics? > > Good question. Due to the amount of pointer/data analysis in LLVM, it > is often necessary to consider memory allocation instructions to see > where memory is allocated and deallocated. As such, the malloc/free > instructions provide that higher-level construct that make it easier to > manipulate memory-operating instructions since that is done so often.This information could be provided by an intrinsic just as well.> Consider: for an llvm.malloc intrinsic, one would have to say something > like this every time we wanted to analyze memory usage: > > if (CallInst *CI = dyn_cast<CallInst>(I)) > if (CI->getCalledFunction()->getName() == "malloc") { > // ... > } > whereas with the malloc instruction raising/lowering pass, you would say > if (MallocInst *MI = dyn_cast<MallocInst>(I)) {Not true at all. Consider the "llvm/IntrinsicInst.h" header, which lets us write stuff like this, today: if (MemCpyInst *MCI = dyn_cast<MemCpyInst>(I)) MCI->getSource() MCI->getLength() ... The one advantage that mallocinst has over using an intrnisic is that instructions can have different return values in various parts of the program (e.g., you can write 'malloc int' instead of '(int*)malloc(4)'). -Chris -- http://llvm.org/ http://nondot.org/sabre/
Thanks all for the fast answers, I'm certainly understanding LLVM better. Some more comments below: Chris Lattner wrote:>>Couldn't the front-end just produce stores/volatile stores and then a >>compilation pass transform them into a write-barrier if necessary? > > > Sortof. The problem with this is that (without gcwrite) there is no way > to identify the stores that should be turned into write barriers. In > particular, the heap may have multiple parts to it, some of which are GC'd > and some are not. For example, the .NET framework has different pointer > types for managed and unmanaged pointers: without the ability to represent > both in LLVM, we could not support it. >Ok, I understand this now. Would allocation to different heaps be represented using different intrinsics, then? Also, some answers were provided by Reid Specer, so I'm merging the two threads here: > Intrinsics are intended to be replaceable by the target's code > generation, if possible/necessary, or emulated with a function call if > not. Language runtimes should be just that .. calls to functions located > in libraries. But sometimes it's more efficient to have knowledge of the runtime in the compiler, i.e. inlining method lookup. Would that be implemented using a lowering pass to substitute the function call for the runtime code?>>2. Stack and registers > > I'm not sure what you mean. In particular, the alloca instruction is used > to explicitly allocate stack space. Because it is not possible to take > the address of LLVM registers, this the mechanism that we use to allocate > stack space. Certain langauges do not need a stack, and thus do not need > to use alloca, other languages (e.g. C) do. If you clarify your question > I'll be able to give a more satisfactory answer. >So would it be possible to pass a pointer to a structure allocated in the stack to a called function? As to what I'm trying to do, more below. Reid Spencer wrote: > I'm not sure what you mean by a "separate stack". Register allocation > can spill registers to THE stack. I'll let someone more knowledgeable > about this answer. I meant a stack separate from the infinite registers. I see now that we need a stack for taking the address of objects, thanks.>>3. Control transfer > I'm not sure what it is that you are trying to do. The abstraction > provided by the call instruction is good for the optimizers and analyses > (because they see exactly what they need, not extra details), good for > compilation speed, and good for target independence. Custom calling > conventions (link above) are specifically designed to give the code > generator the flexibility to pick the most efficient calling convention > for a particular call/return pair. > > Given all of the above (efficient compiled code, easy to write > analysis/xforms, happy front-ends, fast compile times), I'm not sure what > your approach would give. Care to elaborate? >Right now, I'm just trying to think on how I'm going to map our language features on LLVM, to see if we can use it as our back-end with as little modification as possible. Thanks everyone, Marc Ordinas i Llopis | Tragnarion Studios
Misha Brukman wrote:> >>1. Opcodes and intrinsics >> > That's not really correct. The intrinsics such as llvm.frameaddress and > llvm.returnaddress have no equivalents in LLVM opcodes -- the meaning of > the intrinsics is specifically machine-dependent, and LLVM (and its > opcodes) are machine-independent, so there is no valid interchange of > these intrinsics with any combination of LLVM opcodes. >Yes, I understand that those intrinsics are mapped differently on different machines, but isn't 'add' mapped differently too? It seems to me that those intrinsics are there to support the GNU C extensions, making them a 'language feature' of sorts. That's why they are intrinsics and not opcodes.>>3. Control transfer >> > LLVM has the 'call' instruction that abstracts the target machine's > calling convention. See http://llvm.cs.uiuc.edu/docs/LangRef.html#i_call > for more information. >Yes, it abstracts the target's C calling convention. But I'm trying to see if other types of languages can be implemented using LLVM.> What is that you are looking to express that isn't captured by the > `call' instruction? >Tail calls, closures, continuations, lazy allocation... Basically I'm trying to see how am I going to implement high-level functions on LLVM.> Hope that helps,Certainly, thanks! Marc Ordinas i Llopis | Tragnarion Studios