I am interested in using LLVM to translate C and C++ into high-level language code. (As an update to an earlier project of mine, Clue, which used the Sparse compiler library to do this: it targets Lua, Javascript, Perl 5, C, Java and Common Lisp, with a disturbing amount of success. See http://cluecc.sourceforge.net for details.) The obvious place to start on this is the C backend, except in these 2.8 days the C backend is so hedged about with caveats I'm rather wary of basing anything on it. I also recall seeing comments here that it's due for a rewrite from scratch, and that various people were looking into it. Can anyone go into more detail as to what exactly is wrong with the C backend, and whether this rewrite is happening? The other thing I could do is to use the LLVMTargetMachine and treat my HLL as a low-level machine; this gets me a certain amount of good stuff like register allocation and more optimisations, but the documentation is still pretty basic (e.g. http://wiki.llvm.org/Absolute_Minimum_Backend is three short paragraphs) and I'm not certain as to whether LLVMTargetMachine is suitable. For example: my HLL can largely be treated as a register machine with an arbitrary number of registers. Can LLVMTargetMachine handle this? -- ┌─── dg@cowlark.com ───── http://www.cowlark.com ───── │ │ "I have a mind like a steel trap. It's rusty and full of dead mice." │ --- Anonymous, on rasfc -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 254 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110124/81d3dfc5/attachment.sig>
On Jan 24, 2011, at 2:01 PM, David Given wrote:> I am interested in using LLVM to translate C and C++ into high-level > language code. (As an update to an earlier project of mine, Clue, which > used the Sparse compiler library to do this: it targets Lua, Javascript, > Perl 5, C, Java and Common Lisp, with a disturbing amount of success. > See http://cluecc.sourceforge.net for details.) > > The obvious place to start on this is the C backend, except in these 2.8 > days the C backend is so hedged about with caveats I'm rather wary of > basing anything on it. I also recall seeing comments here that it's due > for a rewrite from scratch, and that various people were looking into > it. Can anyone go into more detail as to what exactly is wrong with the > C backend, and whether this rewrite is happening? > > The other thing I could do is to use the LLVMTargetMachine and treat my > HLL as a low-level machine; this gets me a certain amount of good stuff > like register allocation and more optimisations, but the documentation > is still pretty basic (e.g. > http://wiki.llvm.org/Absolute_Minimum_Backend is three short paragraphs) > and I'm not certain as to whether LLVMTargetMachine is suitable. For > example: my HLL can largely be treated as a register machine with an > arbitrary number of registers. Can LLVMTargetMachine handle this?You could create a different code generator from clang or use the rewriting machinery? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110124/247da82c/attachment.html>
On Jan 24, 2011, at 2:01 PM, David Given wrote:> I am interested in using LLVM to translate C and C++ into high-level > language code. (As an update to an earlier project of mine, Clue, which > used the Sparse compiler library to do this: it targets Lua, Javascript, > Perl 5, C, Java and Common Lisp, with a disturbing amount of success. > See http://cluecc.sourceforge.net for details.)If you're familiar with Sparse, then I strongly recommend basing this project on Clang ASTs, not basing it on LLVM IR. -Chris
On 24 Jan 2011, at 22:04, Eric Christopher wrote:> On Jan 24, 2011, at 2:01 PM, David Given wrote: > >> I am interested in using LLVM to translate C and C++ into high-level >> language code. (As an update to an earlier project of mine, Clue, which >> used the Sparse compiler library to do this: it targets Lua, Javascript, >> Perl 5, C, Java and Common Lisp, with a disturbing amount of success. >> See http://cluecc.sourceforge.net for details.) >> >> The obvious place to start on this is the C backend, except in these 2.8 >> days the C backend is so hedged about with caveats I'm rather wary of >> basing anything on it. I also recall seeing comments here that it's due >> for a rewrite from scratch, and that various people were looking into >> it. Can anyone go into more detail as to what exactly is wrong with the >> C backend, and whether this rewrite is happening? >> >> The other thing I could do is to use the LLVMTargetMachine and treat my >> HLL as a low-level machine; this gets me a certain amount of good stuff >> like register allocation and more optimisations, but the documentation >> is still pretty basic (e.g. >> http://wiki.llvm.org/Absolute_Minimum_Backend is three short paragraphs) >> and I'm not certain as to whether LLVMTargetMachine is suitable. For >> example: my HLL can largely be treated as a register machine with an >> arbitrary number of registers. Can LLVMTargetMachine handle this? > > You could create a different code generator from clang or use the rewriting > machinery?-- Send from my Jacquard Loom A better approach would probably be to use Clang's CodeGen lib as inspiration, and write an equivalent that emitted your high-level language code instead of LLVM IR. For example, consider C++ classes: When you convert these to LLVM IR, you lose all of the information about them other than their structure, and the vtable is explicitly created for the target ABI. Mapping them to something like JavaScript, you'd actually want to create a new prototype object for each class, with one slot for each field and another slot for each method (and some extra mixin-style stuff if you wanted to support multiple inheritance). The same is true even for pure C structures - you'd want to represent these as objects with named fields. This information is in the Clang AST, but it's lost by the time you get to LLVM IR. Taking an example from Apple's Foundation framework, you have two structures: typedef { CGFloat x, y; } NSPoint; typedef { CGFloat width, height; } NSSize; In LLVM IR, these are both something like {double, double}. In JavaScript, you'd probably want something like: function NSPoint() { this.x = 0; this.y = 0; } function NSSize() { this.width = 0; this.width = 0; } This is pretty simple to generate from the Clang AST, but will be a huge amount of effort to generate from LLVM IR. David -- Sent from my PDP-11
David Given <dg at cowlark.com> writes:> The obvious place to start on this is the C backend, except in these 2.8 > days the C backend is so hedged about with caveats I'm rather wary of > basing anything on it. I also recall seeing comments here that it's due > for a rewrite from scratch, and that various people were looking into > it. Can anyone go into more detail as to what exactly is wrong with the > C backend, and whether this rewrite is happening?The rewrite is happening. I've got the skeleton of the codegen done, but I have to get it to build before I can check it in. After that, everyone can start adding patterns. The main problem with the current C backend is that there is no legalize phase. So you end up seeing vector types and all sorts of non-C nonsense. It's just overall much cleaner to generate code using the generic framework.> The other thing I could do is to use the LLVMTargetMachine and treat my > HLL as a low-level machine; this gets me a certain amount of good stuff > like register allocation and more optimisations, but the documentation > is still pretty basic (e.g. > http://wiki.llvm.org/Absolute_Minimum_Backend is three short paragraphs) > and I'm not certain as to whether LLVMTargetMachine is suitable. For > example: my HLL can largely be treated as a register machine with an > arbitrary number of registers. Can LLVMTargetMachine handle this?Once I get the new C backend checked in (next week, hopefully), it may be helpful as a guide. -Dave
Chris Lattner <clattner at apple.com> writes:> On Jan 24, 2011, at 2:01 PM, David Given wrote: > >> I am interested in using LLVM to translate C and C++ into high-level >> language code. (As an update to an earlier project of mine, Clue, which >> used the Sparse compiler library to do this: it targets Lua, Javascript, >> Perl 5, C, Java and Common Lisp, with a disturbing amount of success. >> See http://cluecc.sourceforge.net for details.) > > If you're familiar with Sparse, then I strongly recommend basing this > project on Clang ASTs, not basing it on LLVM IR.I completely agree. I forgot to add this to the end of my previous response. :) LLVM IR throws too much information away to target a HLL effectively. -Dave
On 25/01/11 00:17, David A. Greene wrote: [...]> The rewrite is happening. I've got the skeleton of the codegen done, > but I have to get it to build before I can check it in. After that, > everyone can start adding patterns.Is the new C backend 'register' based, that is, generating lots of little statements operating on lots of variables, rather than producing the huge mangled expressions that the old one does? If so, that would be ideal for what I want. [...]> Once I get the new C backend checked in (next week, hopefully), it may > be helpful as a guide.Excellent --- I'll wait for that, then. Will it be announced here? [...]> LLVM IR throws too much information away to target a HLL > effectively.The thing is, I explicitly don't want to use the Clang AST --- I'm not interested in producing an idiomatic translation, merely a fast-performing one. Clue in its current lousy state has proven that this is possible; without any optimisation I'm getting C-to-Java at 60% of native, and C-to-Luajit at 10%. I now want to see what sort of results I get when applying LLVM's optimisations and some more intelligence to the code generation. (Plus, Sparse is buggy and really awkward to work with.) For giggles, here's some example Javascript produced by Clue. function _dtime(fp, stack) { var sp; var H0; var H1; var H2; var H3; var H4; var state = 0; for (;;) { switch (state) { case 0: sp = 2; sp = fp + sp; H1 = null; H0 = 0; H2 = fp; H3 = _gettimeofday; H4 = H3(sp, stack, H2, stack, H0, H1); H0 = fp; H1 = stack[H0 + 0]; H0 = fp; H2 = stack[H0 + 1]; H0 = 1000000.000000; H3 = H2 / H0; H0 = H1 + H3; return H0; } } } (PS. Can people please reply to the list instead of to me directly?) -- ┌─── dg@cowlark.com ───── http://www.cowlark.com ───── │ │ life←{ ↑1 ⍵∨.^3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵ } │ --- Conway's Game Of Life, in one line of APL -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110125/7f73256c/attachment.sig>