More questions on the best way to write a compiler using LLVM:
Lets say I have a struct definition that looks like this:
   const int imageSize = 77;
   struct A {
      char B[align(imageSize)];
   }
...where 'imageSize' is some small inline function that rounds up to a
power of two or something. (A common requirement for textures on 3d
graphics cards.)
Now, clearly the compiler needs to evaluate the array size expression
at compile time, since the size of the array needs to be known up
front. My question is, would it generally be better to generate LLVM
code for the expression and run it in the compiler, or would you
basically have to create an interpreter within your compiler that is
capable of executing your AST?
The disavantage of generating LLVM code for a constant expression are
several, including in particular the handling of errors - if the
constant expression gets a fatal error, you'd prefer not to crash the
compiler. Also, in a cross-compilation environment you'd have to
generate the constant expression for the host platform, rather than
for the target platform.
On the other hand, writing an interpreter means duplicating a lot of
the functionality that's already in LLVM. For example, consider just
the problem of float to int conversions:
   char B[ (int)3.0 ];
Generating code for this is relatively simple; Converting
arbitrary-sized APFloats to arbitrary-sized APInts isn't quite as
easy.
Similarly, the mathematical operators directly supported by APFloat
are only a subset of the math operators supported by the LLVM
instructions.
-- 
-- Talin
On Tue, 22 Jan 2008, Talin wrote:> More questions on the best way to write a compiler using LLVM:ok> Now, clearly the compiler needs to evaluate the array size expression > at compile time, since the size of the array needs to be known up > front. My question is, would it generally be better to generate LLVM > code for the expression and run it in the compiler, or would you > basically have to create an interpreter within your compiler that is > capable of executing your AST?It's really up to you, and it may be up to your language spec. One approach would be to generate a series of llvm::ConstantExpr::get* method calls, which implicitly constant fold expressions where possible. For example, if you ask for "add 3+7" you'll get 10. If you ask for "div 15, 0" you'll get an unfolded constant expr back. However, if you treat type checking separately from code generation, your language may require that you diagnose these sorts of things, and that would mean that you have to implement the constant folding logic in your frontend. This is what clang does, for example. If you go this route, you can still use the LLVM APInt and APFloat classes to do these operations, and maintain the correct precision etc. -Chris -- http://nondot.org/sabre/ http://llvm.org/
Talin wrote:-> On the other hand, writing an interpreter means duplicating a lot of > the functionality that's already in LLVM. For example, consider just > the problem of float to int conversions: > > char B[ (int)3.0 ]; > > Generating code for this is relatively simple; Converting > arbitrary-sized APFloats to arbitrary-sized APInts isn't quite as > easy.APFloat::convertToInteger does just this. Why can't you use it?> Similarly, the mathematical operators directly supported by APFloat > are only a subset of the math operators supported by the LLVM > instructions.Yes, arbitrary math ops are hard if you want to get correctly rounded results for any rounding mode, which was a goal for APFloat. But all IEEE754 ops represented. Neil.
On 22/01/2008, Talin <viridia at gmail.com> wrote:> More questions on the best way to write a compiler using LLVM: > > Lets say I have a struct definition that looks like this: > > const int imageSize = 77; > > struct A { > char B[align(imageSize)]; > } > > ...where 'imageSize' is some small inline function that rounds up to a > power of two or something. (A common requirement for textures on 3d > graphics cards.) > > Now, clearly the compiler needs to evaluate the array size expression > at compile time, since the size of the array needs to be known up > front. My question is, would it generally be better to generate LLVM > code for the expression and run it in the compiler, or would you > basically have to create an interpreter within your compiler that is > capable of executing your AST? >One thing that jumped to mind was "Hey, I can write that as a metafunction!" using C++, so you might be able to find hints in the way that templates are handled. On a similar note, there's a paper on Generalized Constant Expressions ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2235.pdf ) proposed for the next C++ revision, so you might be able to find other people doing the same thing. (Though IIRC that proposal doesn't allow recursion or mutation of values, which drastically simplifies things.) -- Sed quis custodiet ipsos custodes?
Chris Lattner wrote:> It's really up to you, and it may be up to your language spec. One > approach would be to generate a series of llvm::ConstantExpr::get* method > calls, which implicitly constant fold expressions where possible. For > example, if you ask for "add 3+7" you'll get 10. If you ask for "div 15, > 0" you'll get an unfolded constant expr back. >Thanks, that is exactly what I needed.> However, if you treat type checking separately from code generation, your > language may require that you diagnose these sorts of things, and that > would mean that you have to implement the constant folding logic in your > frontend. This is what clang does, for example. >Since my AST nodes wrap the LLVM Constant nodes, I have the choice of mapping from my language types to LLVM types or the reverse, when needed. So I can use whatever is appropriate at any given point.
Neil Booth wrote:> Talin wrote:- > >> On the other hand, writing an interpreter means duplicating a lot of >> the functionality that's already in LLVM. For example, consider just >> the problem of float to int conversions: >> >> char B[ (int)3.0 ]; >> >> Generating code for this is relatively simple; Converting >> arbitrary-sized APFloats to arbitrary-sized APInts isn't quite as >> easy. >> > > APFloat::convertToInteger does just this. Why can't you use it? >Well, I may be using it wrong. But looking at APFloat.h, I see four functions that purport to convert to integer: opStatus convertToInteger(integerPart *, unsigned int, bool, roundingMode) const; opStatus convertFromSignExtendedInteger(const integerPart *, unsigned int, bool, roundingMode); opStatus convertFromZeroExtendedInteger(const integerPart *, unsigned int, bool, roundingMode); APInt convertToAPInt() const; The first three convert to an array of integer parts, which (as far as I can tell) is not easily convertible into an APInt via any public methods I have been able to discover so far. The last function doesn't appear to convert the APFloat into the nearest integer equivalent, since my experiments with it returned completely unexpected values; I'm assuming that what is returned is an APInt containing the bitwise representation of the floating-point value? -- Talin