Hi all, I'm designing a programming language named C³ (or C3). I'm already using LLVM as a back-end for my prototype compiler and it's wonderful to use. Thanks for such a great system! I now have more ambitious goals and I would like to use the LLVM IR as my internal C³ IR. C³ is designed to support what I call "value-oriented programming" and it fits naturally with the design of LLVM. The idea is to apply SSA-based optimizations on user-defined types. I would like to know if you think this plan makes sense: - Add a new derived type that is uniqued by name for C³ types - Add new intrinsic functions for C³ expressions with special semantics - Emit this "extended LLVM" from my abstract syntax tree - Run the mem2reg pass as is for SSA construction - Run optimization passes that can run as is with the new type (like GVN?) - Run a new pass that lowers the extended LLVM to normal LLVM - Run (or rerun) normal LLVM optimization passes - Emit native code using normal LLVM - Profit! Alex
Alexandre Cossette wrote:> Hi all, > > I'm designing a programming language named C³ (or C3). I'm already using LLVM as a back-end for my prototype compiler and it's wonderful to use. Thanks for such a great system! > > I now have more ambitious goals and I would like to use the LLVM IR as my internal C³ IR.Absolutely not. In short, LLVM is its own language. You don't need to extend LLVM IR to support your programming language any more than you need to extend x86 processors to support it. There's the burden of having that support. For starters LLVM's types are purely based on the storage that they back. Most languages use type to provide static program safety, or possibly semantics (ie., + means string concat on a string but addition on integers). LLVM doesn't do that. Further our types are uniqued such that any two types with the same in-memory representation have the same LLVM type; we don't discard names, but we don't preserve a distinction because there isn't any distinction to preserve. That in turn allows us to do fast structural comparisons using a pointer comparison. Then we'd have to extend core passes like mem2reg, gvn, and all of their dependencies. These are performance critical pieces of kit, and we categorically reject any attempt to push in pieces of infrastructure that won't be needed by all users. Put another way, if I want to use LLVM for C code on a cell phone, I shouldn't need to pay the memory/execution-time price for your LLVM changes to support C³. Finally, you haven't detailed what benefit you expect out of your proposal. Why can't you just lower to the existing IR and get the same optimizations out of it? What optimizations aren't possible and why? Can we tackle those issues instead? We've gotten very far by designing extensions to LLVM which are language-agnostic and can be used by any client. For example, if your language has alias analysis optimizations that rely on high-level type information, LLVM has a TBAA (type based aliasing analysis) design that you could employ to give LLVM the additional information it needs to optimize with. Sorry to sound so negative, but I'm confident that LLVM can provide you with the same generated code quality in the same execution time, only through a different design than you propose. If you can show us missed optimizations (or bad compile time problems) when using the naive approach of lowering your high-level types to llvm's low-level types, please let us know so we can solve them case-by-case! Nick C³ is designed to support what I call "value-oriented programming" and it fits naturally with the design of LLVM. The idea is to apply SSA-based optimizations on user-defined types.> > I would like to know if you think this plan makes sense: > - Add a new derived type that is uniqued by name for C³ types > - Add new intrinsic functions for C³ expressions with special semantics > - Emit this "extended LLVM" from my abstract syntax tree > - Run the mem2reg pass as is for SSA construction > - Run optimization passes that can run as is with the new type (like GVN?) > - Run a new pass that lowers the extended LLVM to normal LLVM > - Run (or rerun) normal LLVM optimization passes > - Emit native code using normal LLVM > - Profit! > > Alex > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Nick Lewycky wrote:> Alexandre Cossette wrote: >> Hi all, >> >> I'm designing a programming language named C³ (or C3). I'm already using LLVM as a back-end for my prototype compiler and it's wonderful to use. Thanks for such a great system! >> >> I now have more ambitious goals and I would like to use the LLVM IR as my internal C³ IR. > > Absolutely not. > > In short, LLVM is its own language. You don't need to extend LLVM IR to support your programming language any more than you need to extend x86 processors to support it.I guess I did not express myself clearly. I know I can have my own IR and compile that to LLVM IR. I simply see an opportunity to externally extend the existing system for high-level optimizations instead of writing my own SSA form IR from scratch. I had no intention to change anything inside the LLVM code base! (Except bugs I might find.) After further investigation, I see that the Type class hierarchy does not allow for external extension because of the TypeID enum. Maybe I could hack something using opaque types?> There's the burden of having that support. For starters LLVM's types are purely based on the storage that they back. Most languages use type to provide static program safety, or possibly semantics (ie., + means string concat on a string but addition on integers). LLVM doesn't do that. Further our types are uniqued such that any two types with the same in-memory representation have the same LLVM type; we don't discard names, but we don't preserve a distinction because there isn't any distinction to preserve. That in turn allows us to do fast structural comparisons using a pointer comparison.Some C³ semantics would be handled as transformations inside my custom pass (most would be already handled by my front-end). Regarding the fact that types are "uniqued", I want to note that opaque types are not. To what extent can I keep those around while executing passes?> Then we'd have to extend core passes like mem2reg, gvn, and all of their dependencies. These are performance critical pieces of kit, and we categorically reject any attempt to push in pieces of infrastructure that won't be needed by all users. Put another way, if I want to use LLVM for C code on a cell phone, I shouldn't need to pay the memory/execution-time price for your LLVM changes to support C³.As I understand it, mem2reg only relies on "alloca", "store" and "load" instructions. The algorithm is non-trivial (that's why I want to use it instead of coding my own) but does not seem to have complicated dependencies. Am I right?> Finally, you haven't detailed what benefit you expect out of your proposal. Why can't you just lower to the existing IR and get the same optimizations out of it? What optimizations aren't possible and why? Can we tackle those issues instead? We've gotten very far by designing extensions to LLVM which are language-agnostic and can be used by any client. For example, if your language has alias analysis optimizations that rely on high-level type information, LLVM has a TBAA (type based aliasing analysis) design that you could employ to give LLVM the additional information it needs to optimize with.One scenario that I have in mind is being able to do common subexpression elimination before doing what I call "object allocation" (in analogy to "register allocation"). The result is less temporary objects and therefore less constructor/destructor calls and better resource usage. Contrary to C++, the transformations are sound because all C³ types are regular, by Alex Stepanov's definition of "regular types". Thanks for pointing out TBAA. Comments in LLVM 2.8 says "This is a work-in-progress. It doesn't work yet, and the metadata format isn't stable." What's the current status?> Sorry to sound so negative, but I'm confident that LLVM can provide you with the same generated code quality in the same execution time, only through a different design than you propose. If you can show us missed optimizations (or bad compile time problems) when using the naive approach of lowering your high-level types to llvm's low-level types, please let us know so we can solve them case-by-case!It's all right. Don't worry, I don't want to turn LLVM inside out :) That being said, I still believe there is a way I could use that nice mem2reg pass for my purpose... Alex
On 01/13/2011 03:46 AM, Nick Lewycky wrote:> Absolutely not. > > In short, LLVM is its own language. You don't need to extend LLVM IR to > support your programming language any more than you need to extend x86 > processors to support it. > > There's the burden of having that support. For starters LLVM's types are > purely based on the storage that they back. Most languages use type to > provide static program safety, or possibly semantics (ie., + means > string concat on a string but addition on integers). LLVM doesn't do > that. Further our types are uniqued such that any two types with the > same in-memory representation have the same LLVM type; we don't discard > names, but we don't preserve a distinction because there isn't any > distinction to preserve. That in turn allows us to do fast structural > comparisons using a pointer comparison. > > Then we'd have to extend core passes like mem2reg, gvn, and all of their > dependencies. These are performance critical pieces of kit, and we > categorically reject any attempt to push in pieces of infrastructure > that won't be needed by all users. Put another way, if I want to use > LLVM for C code on a cell phone, I shouldn't need to pay the > memory/execution-time price for your LLVM changes to support C³. > > Finally, you haven't detailed what benefit you expect out of your > proposal. Why can't you just lower to the existing IR and get the same > optimizations out of it? What optimizations aren't possible and why? Can > we tackle those issues instead? We've gotten very far by designing > extensions to LLVM which are language-agnostic and can be used by any > client. For example, if your language has alias analysis optimizations > that rely on high-level type information, LLVM has a TBAA (type based > aliasing analysis) design that you could employ to give LLVM the > additional information it needs to optimize with. > > Sorry to sound so negative, but I'm confident that LLVM can provide you > with the same generated code quality in the same execution time, only > through a different design than you propose. If you can show us missed > optimizations (or bad compile time problems) when using the naive > approach of lowering your high-level types to llvm's low-level types, > please let us know so we can solve them case-by-case! > > NickI think that what Alexandre wants to do is to leverage the power of the LLVM SSA transformation/optimization framework for types that might not be natively defined by LLVM. This is something that I believe is already possible in LLVM (with the addition of some select user-defined passes and careful use of types), but it can be awkward to use due to the structure typing inherent in LLVM. For example, I define one of the custom types in my language to i64, but this only makes sense as long as I can uniquely identify this type as i64 - that is I haven't overloaded i64 to mean anything else. Other types could be introduced as other bit-width integers (i65), structure types, etc. So it's possible, if not clean. Actually, looking over the list of optimizations on LLVM IR I'm having trouble finding more than a handful that explicitly rely on the storage type of all data. So it seems like a very valid use case to use LLVM for optimization with user-specific types within SSA form, before lowering the code (or translating back to source). Andrew