thr3ads.net - llvm dev - [LLVMdev] Extending LLVM for high-level types [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Alexandre Cossette

2011-Jan-12 20:42 UTC

[LLVMdev] Extending LLVM for high-level types

Hi all,

I'm designing a programming language named C³ (or C3). I'm already using
LLVM as a back-end for my prototype compiler and it's wonderful to use.
Thanks for such a great system!

I now have more ambitious goals and I would like to use the LLVM IR as my
internal C³ IR. C³ is designed to support what I call "value-oriented
programming" and it fits naturally with the design of LLVM. The idea is to
apply SSA-based optimizations on user-defined types.

I would like to know if you think this plan makes sense:
- Add a new derived type that is uniqued by name for C³ types
- Add new intrinsic functions for C³ expressions with special semantics
- Emit this "extended LLVM" from my abstract syntax tree
- Run the mem2reg pass as is for SSA construction
- Run optimization passes that can run as is with the new type (like GVN?)
- Run a new pass that lowers the extended LLVM to normal LLVM
- Run (or rerun) normal LLVM optimization passes
- Emit native code using normal LLVM
- Profit!

Alex

Nick Lewycky

2011-Jan-13 08:46 UTC

head link

[LLVMdev] Extending LLVM for high-level types

Alexandre Cossette wrote:> Hi all,
>
> I'm designing a programming language named C³ (or C3). I'm already
using LLVM as a back-end for my prototype compiler and it's wonderful to
use. Thanks for such a great system!
>
> I now have more ambitious goals and I would like to use the LLVM IR as my
internal C³ IR.
Absolutely not.

In short, LLVM is its own language. You don't need to extend LLVM IR to 
support your programming language any more than you need to extend x86 
processors to support it.

There's the burden of having that support. For starters LLVM's types are
purely based on the storage that they back. Most languages use type to 
provide static program safety, or possibly semantics (ie., + means 
string concat on a string but addition on integers). LLVM doesn't do 
that. Further our types are uniqued such that any two types with the 
same in-memory representation have the same LLVM type; we don't discard 
names, but we don't preserve a distinction because there isn't any 
distinction to preserve. That in turn allows us to do fast structural 
comparisons using a pointer comparison.

Then we'd have to extend core passes like mem2reg, gvn, and all of their 
dependencies. These are performance critical pieces of kit, and we 
categorically reject any attempt to push in pieces of infrastructure 
that won't be needed by all users. Put another way, if I want to use 
LLVM for C code on a cell phone, I shouldn't need to pay the 
memory/execution-time price for your LLVM changes to support C³.

Finally, you haven't detailed what benefit you expect out of your 
proposal. Why can't you just lower to the existing IR and get the same 
optimizations out of it? What optimizations aren't possible and why? Can 
we tackle those issues instead? We've gotten very far by designing 
extensions to LLVM which are language-agnostic and can be used by any 
client. For example, if your language has alias analysis optimizations 
that rely on high-level type information, LLVM has a TBAA (type based 
aliasing analysis) design that you could employ to give LLVM the 
additional information it needs to optimize with.

Sorry to sound so negative, but I'm confident that LLVM can provide you 
with the same generated code quality in the same execution time, only 
through a different design than you propose. If you can show us missed 
optimizations (or bad compile time problems) when using the naive 
approach of lowering your high-level types to llvm's low-level types, 
please let us know so we can solve them case-by-case!

Nick

C³ is designed to support what I call "value-oriented programming" and
it fits naturally with the design of LLVM. The idea is to apply 
SSA-based optimizations on user-defined types.>
> I would like to know if you think this plan makes sense:
> - Add a new derived type that is uniqued by name for C³ types
> - Add new intrinsic functions for C³ expressions with special semantics
> - Emit this "extended LLVM" from my abstract syntax tree
> - Run the mem2reg pass as is for SSA construction
> - Run optimization passes that can run as is with the new type (like GVN?)
> - Run a new pass that lowers the extended LLVM to normal LLVM
> - Run (or rerun) normal LLVM optimization passes
> - Emit native code using normal LLVM
> - Profit!
>
> Alex
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Alexandre Cossette

2011-Jan-13 22:30 UTC

head link

[LLVMdev] Extending LLVM for high-level types

Nick Lewycky wrote:> Alexandre Cossette wrote:
>> Hi all,
>> 
>> I'm designing a programming language named C³ (or C3). I'm
already using LLVM as a back-end for my prototype compiler and it's
wonderful to use. Thanks for such a great system!
>> 
>> I now have more ambitious goals and I would like to use the LLVM IR as
my internal C³ IR.
> 
> Absolutely not.
> 
> In short, LLVM is its own language. You don't need to extend LLVM IR to
support your programming language any more than you need to extend x86
processors to support it.
I guess I did not express myself clearly. I know I can have my own IR and
compile that to LLVM IR. I simply see an opportunity to externally extend the
existing system for high-level optimizations instead of writing my own SSA form
IR from scratch. I had no intention to change anything inside the LLVM code
base! (Except bugs I might find.)

After further investigation, I see that the Type class hierarchy does not allow
for external extension because of the TypeID enum. Maybe I could hack something
using opaque types?
> There's the burden of having that support. For starters LLVM's
types are purely based on the storage that they back. Most languages use type to
provide static program safety, or possibly semantics (ie., + means string concat
on a string but addition on integers). LLVM doesn't do that. Further our
types are uniqued such that any two types with the same in-memory representation
have the same LLVM type; we don't discard names, but we don't preserve a
distinction because there isn't any distinction to preserve. That in turn
allows us to do fast structural comparisons using a pointer comparison.
Some C³ semantics would be handled as transformations inside my custom pass
(most would be already handled by my front-end). Regarding the fact that types
are "uniqued",  I want to note that opaque types are not. To what
extent can I keep those around while executing passes?
> Then we'd have to extend core passes like mem2reg, gvn, and all of
their dependencies. These are performance critical pieces of kit, and we
categorically reject any attempt to push in pieces of infrastructure that
won't be needed by all users. Put another way, if I want to use LLVM for C
code on a cell phone, I shouldn't need to pay the memory/execution-time
price for your LLVM changes to support C³.
As I understand it, mem2reg only relies on "alloca", "store"
and "load" instructions. The algorithm is non-trivial (that's why
I want to use it instead of coding my own) but does not seem to have complicated
dependencies. Am I right?
> Finally, you haven't detailed what benefit you expect out of your
proposal. Why can't you just lower to the existing IR and get the same
optimizations out of it? What optimizations aren't possible and why? Can we
tackle those issues instead? We've gotten very far by designing extensions
to LLVM which are language-agnostic and can be used by any client. For example,
if your language has alias analysis optimizations that rely on high-level type
information, LLVM has a TBAA (type based aliasing analysis) design that you
could employ to give LLVM the additional information it needs to optimize with.
One scenario that I have in mind is being able to do common subexpression
elimination before doing what I call "object allocation" (in analogy
to "register allocation"). The result is less temporary objects and
therefore less constructor/destructor calls and better resource usage. Contrary
to C++, the transformations are sound because all C³ types are regular, by Alex
Stepanov's definition of "regular types".

Thanks for pointing out TBAA. Comments in LLVM 2.8 says "This is a
work-in-progress. It doesn't work yet, and the metadata format isn't
stable." What's the current status?
> Sorry to sound so negative, but I'm confident that LLVM can provide you
with the same generated code quality in the same execution time, only through a
different design than you propose. If you can show us missed optimizations (or
bad compile time problems) when using the naive approach of lowering your
high-level types to llvm's low-level types, please let us know so we can
solve them case-by-case!
It's all right. Don't worry, I don't want to turn LLVM inside out :)
That being said, I still believe there is a way I could use that nice mem2reg
pass for my purpose...

Alex

Andrew Clinton

2011-Jan-14 00:54 UTC

head link

[LLVMdev] Extending LLVM for high-level types

On 01/13/2011 03:46 AM, Nick Lewycky wrote:> Absolutely not.
>
> In short, LLVM is its own language. You don't need to extend LLVM IR to
> support your programming language any more than you need to extend x86
> processors to support it.
>
> There's the burden of having that support. For starters LLVM's
types are
> purely based on the storage that they back. Most languages use type to
> provide static program safety, or possibly semantics (ie., + means
> string concat on a string but addition on integers). LLVM doesn't do
> that. Further our types are uniqued such that any two types with the
> same in-memory representation have the same LLVM type; we don't discard
> names, but we don't preserve a distinction because there isn't any
> distinction to preserve. That in turn allows us to do fast structural
> comparisons using a pointer comparison.
>
> Then we'd have to extend core passes like mem2reg, gvn, and all of
their
> dependencies. These are performance critical pieces of kit, and we
> categorically reject any attempt to push in pieces of infrastructure
> that won't be needed by all users. Put another way, if I want to use
> LLVM for C code on a cell phone, I shouldn't need to pay the
> memory/execution-time price for your LLVM changes to support C³.
>
> Finally, you haven't detailed what benefit you expect out of your
> proposal. Why can't you just lower to the existing IR and get the same
> optimizations out of it? What optimizations aren't possible and why?
Can
> we tackle those issues instead? We've gotten very far by designing
> extensions to LLVM which are language-agnostic and can be used by any
> client. For example, if your language has alias analysis optimizations
> that rely on high-level type information, LLVM has a TBAA (type based
> aliasing analysis) design that you could employ to give LLVM the
> additional information it needs to optimize with.
>
> Sorry to sound so negative, but I'm confident that LLVM can provide you
> with the same generated code quality in the same execution time, only
> through a different design than you propose. If you can show us missed
> optimizations (or bad compile time problems) when using the naive
> approach of lowering your high-level types to llvm's low-level types,
> please let us know so we can solve them case-by-case!
>
> Nick

I think that what Alexandre wants to do is to leverage the power of the 
LLVM SSA transformation/optimization framework for types that might not 
be natively defined by LLVM.  This is something that I believe is 
already possible in LLVM (with the addition of some select user-defined 
passes and careful use of types), but it can be awkward to use due to 
the structure typing inherent in LLVM.  For example, I define one of the 
custom types in my language to i64, but this only makes sense as long as 
I can uniquely identify this type as i64 - that is I haven't overloaded 
i64 to mean anything else.  Other types could be introduced as other 
bit-width integers (i65), structure types, etc.  So it's possible, if 
not clean.

Actually, looking over the list of optimizations on LLVM IR I'm having 
trouble finding more than a handful that explicitly rely on the storage 
type of all data.  So it seems like a very valid use case to use LLVM 
for optimization with user-specific types within SSA form, before 
lowering the code (or translating back to source).

Andrew

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jan 2011 - [LLVMdev] Extending LLVM for high-level types

[LLVMdev] Extending LLVM for high-level types

[LLVMdev] Extending LLVM for high-level types

[LLVMdev] Extending LLVM for high-level types

[LLVMdev] Extending LLVM for high-level types

Maybe Matching Threads