I recently subscribed to the LLVM and Clang developer mailing lists, and earlier today read a message requesting advice on "Contributing to LLVM Projects". I'm hoping to eventually get involved as well, but with my situation being very different to that described in the referenced thread I got the idea of solliciting advice seperately. If this is not the right place to ask, though, then please let me know and ignore the remainder of this message. Right now, I only 'really' know Java for my current job, save for some Prolog, Haskell and Coq that I picked up before during my linguistics studies (specializing in categorial grammar, and having supplemented with courses from CS and AI on logic and type theory). I thus still have a long road ahead, already when it comes to just bare theory, let alone implementation. The following is a detailed study plan that I have come up with for my spare time that I think will last me for the next 1.5 years or so (naturally to be supplemented with the LLVM documentation), and which I would be very happy to obtain any feedback on. Instead of beginning with C++, I rather started studying C from K&R and (once finished) the book by Fraser and Hanson on LCC. Afterwards, I already have a modest reading list prepared of some articles on SSA (see below) which I hope to combine with a study of the LLVM IR. Based on these readings, as a matter of exercise I can take what I learnt from LCC to write a small compiler in C targeting the LLVM IR for, say, a subset of Prolog. Ideally, I should reach this stage by the end of this year. While far from 'done' with C, I suppose that at this point I can also proceed to pick up some C++ (having in mind at least Accelerated C++ and the Effective (Modern) C++ books, coupled perhaps with Modern C++ Design for template metaprogramming), by which time I think I should be prepared to start following some of the developments around Clang and start digging into its code. All this, in turn, should keep me busy during the first half of the next year at least. I hope, however, to also come to thoroughly understand the LLVM backend, and I would suppose that, besides following literature on compiler optimizations, going through a book or two on computer architecture should give me a start to dig deeper into its source (thinking currently of Computer Systems: A Programmer's Perspective). There will surely never be any shortage of good books to read (in particular hoping to also squeeze in Sedgewick's algorithms in C and a book on OS architecture somewhere), but I hope all this at least should give me a good start from which to start planning further, and hopefully eventually get involved somehow. However, given my current state of knowledge and being limited to my spare time, I'm guessing I'll be looking at at least five years or so to reach a lower intermediate level understanding of most of the important concepts involved. As for the articles that I had in mind for coming to grips with the basics of SSA, I came up with the following list, planning to first gradually work my way up to an understanding of control- and data dependence. 1) Prosser '59: Applications of Boolean matrices to the analysis of flow diagrams <Introduces the concept of dominators, I believe.> 2) Lowry & Medlock '69: Object code optimization <For further discussion of the dominator concept.> 3) Allen '70: Control flow analysis 4) Allen & Cocke '76: A program data flow analysis procedure 5) Ferrante et al. '87: The program dependence graph and its use in optimization 6) Rosen et al. '88: Global value numbers and redundant computations 7) Alpern et al. '88: Detecting equality of variables in programs 8) Cytron et al. '91: Efficiently computing static single assignment form and the control dependence graph Many thanks for taking the time to read through all this. I would be very happy to receive any feedback on these plans, and again, if this was not the right place to ask, then my apologies. Kind regards, Arno Bastenhof -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150803/1674513d/attachment.html>
On 3 August 2015 at 16:29, Arno Bastenhof <arnobastenhof at gmail.com> wrote:> Based on these > readings, as a matter of exercise I can take what I learnt from LCC to write > a small compiler in C targeting the LLVM IR for, say, a subset of Prolog.Hi Arno, This is not a monumental task. There are many tutorials in the LLVM tree, and producing IR from a simple AST is actually simple and straight-forward. So many languages can target LLVM IR, that it's very hard that you'll find something that cannot be represented in IR. At least something that other implementations haven't found already, and worked around. With your broad knowledge in CS, I wouldn't expect you to take more than a month or two to get a front-end working with the JIT compiler on a reasonable subset of Prolog. Of course, the deeper you go, the more time it takes to progress the same 1%, but you can always stop when productivity drops as fast as enjoyment. :) But you won't need to understand the back-end at all, modulo some quirks on your platform of choice. The hint is to write the same logic in Prolog and C, compile the C with Clang and see what they do. If you spot hackery, the least resistance path is to replicate the hackery. Asking on the list is also a good way to understand why the hackery, and maybe even give you an entry point into changing the middle/back end to support a proper construct in IR for better codegen.>From there, I'd recommend you to try different targets. Get someboard, device, phone you have at your disposal (RaspberryPi2 is a great inexpensive kit), and make it work. You'll take a lot less effort than you're imagining... cheers, --renato
When I set out on my "Pascal Compiler project", I had never written a compiler before - I had SOME experience in writing interpreters and other parser/parser-like projects. That's about 14 months ago. I think I started around end of February, and by the beginning of march, when I pushed the first commit to github, I already had small programs working. I only spend (some of my) evenings and weekends on this project, and there have been periods where I didn't do any work for several weeks for one reason or another. I'm now at the a stage where the parser and semantic analysis is parsing the iso7185pat.pas ("pat = Pascal Acceptance Test"). It's not yet passing the code-gen phase - the current stumbling block is trampolining for nested function pointer calls, and I think there are going to be a few other problems along the way. Given that this file is "trying to do everything the ISO7185 standard allows", it's not surprising it finds a few interesting issues... I have a few decent size home-grown tests, such as a sudoku solver that "works" - and because LLVM provides good optimisation, my compiler beats the "regular" Free Pascal compiler on all like-for-like comparisons I've made so far - in particular, the Sudoku solver is about 2x faster with my compiler than Free Pascal (ok, I cheat, some 5% of that comes from having implemented a "popcount" intrinsic that can take a "set" and give the number of elements in the set, which standard pascal or Free Pascal doesn't have). Of course, like Renato says, for every small step towards "full functionality" gets harder - and the good thing with a hobby project is the ability to choose how much time you spend on what features, and how you implement them. I thoroughly recommend "try it, see how it goes". I've found it surprisingly easy... [Mostly because other people have done the REALLY hard part of making the LLVM IR to machine code phase! - Thanks guys!] -- Mats On 3 August 2015 at 18:07, Renato Golin <renato.golin at linaro.org> wrote:> On 3 August 2015 at 16:29, Arno Bastenhof <arnobastenhof at gmail.com> wrote: > > Based on these > > readings, as a matter of exercise I can take what I learnt from LCC to > write > > a small compiler in C targeting the LLVM IR for, say, a subset of Prolog. > > Hi Arno, > > This is not a monumental task. There are many tutorials in the LLVM > tree, and producing IR from a simple AST is actually simple and > straight-forward. > > So many languages can target LLVM IR, that it's very hard that you'll > find something that cannot be represented in IR. At least something > that other implementations haven't found already, and worked around. > > With your broad knowledge in CS, I wouldn't expect you to take more > than a month or two to get a front-end working with the JIT compiler > on a reasonable subset of Prolog. Of course, the deeper you go, the > more time it takes to progress the same 1%, but you can always stop > when productivity drops as fast as enjoyment. :) > > But you won't need to understand the back-end at all, modulo some > quirks on your platform of choice. The hint is to write the same logic > in Prolog and C, compile the C with Clang and see what they do. If you > spot hackery, the least resistance path is to replicate the hackery. > Asking on the list is also a good way to understand why the hackery, > and maybe even give you an entry point into changing the middle/back > end to support a proper construct in IR for better codegen. > > From there, I'd recommend you to try different targets. Get some > board, device, phone you have at your disposal (RaspberryPi2 is a > great inexpensive kit), and make it work. You'll take a lot less > effort than you're imagining... > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150803/e7b4f2ed/attachment.html>
Good recommendations on the thread already, but let me add by saying you don't _need_ to learn C or C++ to use LLVM; of course contributions would require familiarity with C++. You can use a library that talks to LLVM in your favorite language, and if that's Java, this seems to be the most up-to-date project with bindings: https://github.com/robovm/robovm/tree/master/llvm Let me also critique your reading list if your goal is writing a first compiler. Understanding SSA, optimizations, etc. is 10x more lucid (and you appreciate it more) when you see some IR or assembly and the impact a certain optimization or concept can have on it. Let me be clear, your reading list is bang on for some topics, but I feel like you'll miss some "aha" moments if you've not actually written some code first-hand, or maybe gone through some simple code generation steps (even if it's reading code). In college, I took a course (and did well) on compiler optimizations (program analysis to be specific) and we went through pretty much all what is needed (theory) to build an optimizing compiler: cfg construction, dataflow frameworks, SSA form, fixed points, lattices and abstract interpretation, etc. but a few years hence without using any of that it's been somewhat of a re-learning this past year or so. The formal education there was absolutely critical for the concepts, but I didn't get something tangible at the end of it. If optimizations are really what you want to understand or implement, might I suggest starting with the one of the simplest ones: Constant folding. It's simple to implement yet there is a feeling of accomplishment. Finally, to be honest since learning is an individual experience my suggestions may not at all be helpful, but if you learn like I do ... by doing, they just might. P.S. - I have no idea about your level of cs understanding is, so please don't be offended if what I've suggested is too naive or simple for you. On Mon, Aug 3, 2015 at 10:07 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 3 August 2015 at 16:29, Arno Bastenhof <arnobastenhof at gmail.com> wrote: >> Based on these >> readings, as a matter of exercise I can take what I learnt from LCC to write >> a small compiler in C targeting the LLVM IR for, say, a subset of Prolog. > > Hi Arno, > > This is not a monumental task. There are many tutorials in the LLVM > tree, and producing IR from a simple AST is actually simple and > straight-forward. > > So many languages can target LLVM IR, that it's very hard that you'll > find something that cannot be represented in IR. At least something > that other implementations haven't found already, and worked around. > > With your broad knowledge in CS, I wouldn't expect you to take more > than a month or two to get a front-end working with the JIT compiler > on a reasonable subset of Prolog. Of course, the deeper you go, the > more time it takes to progress the same 1%, but you can always stop > when productivity drops as fast as enjoyment. :) > > But you won't need to understand the back-end at all, modulo some > quirks on your platform of choice. The hint is to write the same logic > in Prolog and C, compile the C with Clang and see what they do. If you > spot hackery, the least resistance path is to replicate the hackery. > Asking on the list is also a good way to understand why the hackery, > and maybe even give you an entry point into changing the middle/back > end to support a proper construct in IR for better codegen. > > From there, I'd recommend you to try different targets. Get some > board, device, phone you have at your disposal (RaspberryPi2 is a > great inexpensive kit), and make it work. You'll take a lot less > effort than you're imagining... > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev