On Mon, 28 Apr 2008 17:54:31 -0400, Gordon Henriksen wrote:> On Apr 28, 2008, at 17:32, Hendrik Boom wrote: > >> In http://llvm.org/docs/FAQ.html, when taking about writing a compiler >> that uses LLVM (at least I think that's what the FAQ question is >> asking), >> the FAQ recommends >> >>> # Call into the LLVM libraries code using your language's FFI >>> (foreign >>> function interface). >>> >>> * for: best tracks changes to the LLVM IR, .ll syntax, and .bc >>> format >>> * for: enables running LLVM optimization passes without a >>> emit/parse overhead >>> * for: adapts well to a JIT context >>> * against: lots of ugly glue code to write >> >> Now, which particular libraries would that be > > With the exception of the 'util' and 'tools' directories, the entire > LLVM source tree consists of libraries.Indeed, quite a lot of them. Most of them appear to be internal. I'm trying to identify the ones that are intended for use by LLVM users. I have to say I missed the crucial paragraph: : If you go with the first option, the C bindings in include/llvm-c : should help a lot, since most languages have strong support for : interfacing with C. The most common hurdle with calling C from managed : code is interfacing with the garbage collector. The C interface was : designed to require very little memory management, and so is : straightforward in this regard. Evidently I have to go look in include/llvm-c, since I stronlgly suspect you didn't go to the trouble of writng a C wrapper for anything that wasn't needed by an LLVM user. Anything internal you'd have left in C++. So the API for a C++ *user* could be described as "those parts of the internals API that happen to be used in implementing llvm-c. What I found in llvm-c was core.h. Is that what I need to know for writing a compiler front-end? Let's see. core.h seems to describe building the LLVM code. BitWriter says how to write it to a file, should that be desired. It's not clear what lto.h, Analysis.h. c/ ExecutionEngine.h do or why I'd need them. Target.h looks useful if I have to include machine-dependencies into my code generator. Some things I do may depend on the size of pionters and the like. Putting this together with the tutorial, http://llvm.org/docs/tutorial/, which uses CAML instead of C, I think I may be able to get a clue.> >> where are their API(s) documented? > > http://llvm.org/docs/ > http://llvm.org/doxygen/ > http://llvm.org/docs/tutorial/ > etc etc etc. > > — GordonThe doxygen page describes the complete internal structure of LLVM. It explicitly says, ; This documentation describes the internal software that makes up LLVM, ; not the external use of LLVM. There are no instructions here on how to ; use LLVM, only the APIs that make up the software. For usage ; instructions, please see the programmer's guide or reference manual. I haven't yet found a "programmer's guide". The only reference manual I've found so far was "LLVM Language Reference Manual", linked from the llvm.org/docs page. It describes a programming language with a syntax. No doubt it is a textual representation of the information to be transmitted using the API I'm looking for, but it doesn't document the API. I can probably find what I'm looking for by prowling the source code that implements this LLVM language, and seeing what it calls, then looking those classes and methods in the doxygen stuff. That's another way, complementary to guessing the realtionship between the ocaml tutorial and Core.h. -- hendrik
On 2008-04-29, at 08:41, Hendrik Boom wrote:> On Mon, 28 Apr 2008 17:54:31 -0400, Gordon Henriksen wrote: > >> On Apr 28, 2008, at 17:32, Hendrik Boom wrote: >> >>> In http://llvm.org/docs/FAQ.html, when taking about writing a >>> compiler >>> that uses LLVM (at least I think that's what the FAQ question is >>> asking), >>> the FAQ recommends >>> >>>> # Call into the LLVM libraries code using your language's FFI >>>> (foreign >>>> function interface). >>>> >>>> * for: best tracks changes to the LLVM IR, .ll syntax, and .bc >>>> format >>>> * for: enables running LLVM optimization passes without a >>>> emit/parse overhead >>>> * for: adapts well to a JIT context >>>> * against: lots of ugly glue code to write >>> >>> Now, which particular libraries would that be >> >> With the exception of the 'util' and 'tools' directories, the entire >> LLVM source tree consists of libraries. > > Indeed, quite a lot of them. Most of them appear to be internal. > I'm trying to identify the ones that are intended for use by LLVM > users.include/llvm is all public (modulo some implementation details as required by the nature of C++). Private includes are in lib. But realize that not all users are front-end compilers. A back-end code generator is also a user of the framework; as is an IR optimization or analysis. The C++ interfaces support all of these clients equally. VMCore and BitWriter are the libraries absolutely necessary for any static compiler that outputs bitcode. You'll likely want Analysis for the verifier; and Target for memory layout information. That's the basics.> I have to say I missed the crucial paragraph: > > : If you go with the first option, the C bindings in include/llvm-c > : should help a lot, since most languages have strong support for > : interfacing with C. The most common hurdle with calling C from > managed > : code is interfacing with the garbage collector. The C interface was > : designed to require very little memory management, and so is > : straightforward in this regard. > > Evidently I have to go look in include/llvm-c, since I stronlgly > suspect > you didn't go to the trouble of writng a C wrapper for anything that > wasn't needed by an LLVM user. Anything internal you'd have left in > C++. > > So the API for a C++ *user* could be described as "those parts of the > internals API that happen to be used in implementing llvm-c.That's a rather poor definition. Only bindings for such features as have been required are authored. Still, if this helps you make sense of the framework, then that's fantastic; but remember that it is an imperfect rule. Using the C bindings, it's still very important to understand the underlying C++ object model; otherwise, the type rules for the bindings will appear to be rather capricious.> Putting this together with the tutorial, http://llvm.org/docs/tutorial/ > , > which uses CAML instead of C, I think I may be able to get a clue.If you're not using ocaml, the C++ tutorial (the first one on that page) is probably more pertinent, even if you do intend to use the C bindings. Searching the implementation of the bindings (lib/VMCore/ Core.cpp, etc.) is helpful for "going backwards" from C++ to C once you begin to understand the object model.>>> where are their API(s) documented? >> >> http://llvm.org/docs/ >> http://llvm.org/doxygen/ >> http://llvm.org/docs/tutorial/ >> etc etc etc. >> >> — Gordon > > The doxygen page describes the complete internal structure of LLVM. > It > explicitly says, > > ; This documentation describes the internal software that makes up > LLVM, > ; not the external use of LLVM. There are no instructions here on > how to > ; use LLVM, only the APIs that make up the software. For usage > ; instructions, please see the programmer's guide or reference manual. > > I haven't yet found a "programmer's guide".http://llvm.org/docs/ProgrammersManual.html — Gordon
On Tue, 29 Apr 2008 09:46:35 -0400, Gordon Henriksen wrote:> On 2008-04-29, at 08:41, Hendrik Boom wrote: > >> On Mon, 28 Apr 2008 17:54:31 -0400, Gordon Henriksen wrote: >> >>> On Apr 28, 2008, at 17:32, Hendrik Boom wrote: >>> >>>> In http://llvm.org/docs/FAQ.html, when taking about writing a >>>> compiler >>>> that uses LLVM (at least I think that's what the FAQ question is >>>> asking), >>>> the FAQ recommends >>>> >>>>> # Call into the LLVM libraries code using your language's FFI >>>>> (foreign >>>>> function interface). >>>>> >>>>> * for: best tracks changes to the LLVM IR, .ll syntax, and .bc >>>>> format >>>>> * for: enables running LLVM optimization passes without a >>>>> emit/parse overhead >>>>> * for: adapts well to a JIT context >>>>> * against: lots of ugly glue code to write >>>> >>>> Now, which particular libraries would that be >>> >>> With the exception of the 'util' and 'tools' directories, the entire >>> LLVM source tree consists of libraries. >> >> Indeed, quite a lot of them. Most of them appear to be internal. I'm >> trying to identify the ones that are intended for use by LLVM users. > > include/llvm is all public (modulo some implementation details as > required by the nature of C++). Private includes are in lib. But realize > that not all users are front-end compilers. A back-end code generator is > also a user of the framework; as is an IR optimization or analysis. The > C++ interfaces support all of these clients equally. > > VMCore and BitWriter are the libraries absolutely necessary for any > static compiler that outputs bitcode. You'll likely want Analysis for > the verifier; and Target for memory layout information. That's the > basics. > >> I have to say I missed the crucial paragraph: >> >> : If you go with the first option, the C bindings in include/llvm-c : >> should help a lot, since most languages have strong support for : >> interfacing with C. The most common hurdle with calling C from managed >> : code is interfacing with the garbage collector. The C interface was : >> designed to require very little memory management, and so is : >> straightforward in this regard. >> >> Evidently I have to go look in include/llvm-c, since I stronlgly >> suspect >> you didn't go to the trouble of writng a C wrapper for anything that >> wasn't needed by an LLVM user. Anything internal you'd have left in >> C++. >> >> So the API for a C++ *user* could be described as "those parts of the >> internals API that happen to be used in implementing llvm-c. > > That's a rather poor definition. Only bindings for such features as have > been required are authored. Still, if this helps you make sense of the > framework, then that's fantastic; but remember that it is an imperfect > rule. > > Using the C bindings, it's still very important to understand the > underlying C++ object model; otherwise, the type rules for the bindings > will appear to be rather capricious. > >> Putting this together with the tutorial, http://llvm.org/docs/tutorial/ >> , >> which uses CAML instead of C, I think I may be able to get a clue. > > If you're not using ocaml, the C++ tutorial (the first one on that page) > is probably more pertinent, even if you do intend to use the C bindings. > Searching the implementation of the bindings (lib/VMCore/ Core.cpp, > etc.) is helpful for "going backwards" from C++ to C once you begin to > understand the object model. > >>>> where are their API(s) documented? >>> >>> http://llvm.org/docs/ >>> http://llvm.org/doxygen/ >>> http://llvm.org/docs/tutorial/ >>> etc etc etc. >>> >>> — Gordon >> >> The doxygen page describes the complete internal structure of LLVM. It >> explicitly says, >> >> ; This documentation describes the internal software that makes up >> LLVM, >> ; not the external use of LLVM. There are no instructions here on how >> to >> ; use LLVM, only the APIs that make up the software. For usage ; >> instructions, please see the programmer's guide or reference manual. >> >> I haven't yet found a "programmer's guide". > > http://llvm.org/docs/ProgrammersManual.htmlHere's what I have in mind to do with LLVM. Thanks. I have a few languages to compile; all of them require garbage collection. I'll be looking at the ocaml experience with some interest. How far I get into implementing them depends on the available time. and the state of my enthusiasm. It has been known to go missing, and it often gets diverted to so-called real life. One of these languages, Algol 68, I was working on about 35 years ago. It was not finished mainly because at some point the machinery I was developing it on became unavailable. It correctly ran over half of a demanding test suite when the project stopped. It's now something I'd like to finish more for old time's sake than any serious use. 35 years ago, this compiler would run in about 900K memory. That was a dream machine back then. Using an overlay linker, it could be crammed into 400K. It was written in Algol W, and could use a new portable code generator. It used garbage collection at compile time, but on today's machines I could probably get away with wholesale memory leakage. To get it working, of course I need something that implements Algol W. I've tinkered with translating Algol W to C or something similar. I originally intended to translate the Algol 68 compiler into Algol 68, to make it self-supporting, but I never got that far. I have an Algol W parser, and at least one ancient attribute grammar that (too slowly) translates it to something else. Since I'll only be using it to develop Algol 68, which runs in 900K, I can probably dispense with garbage collection and just use my 4 gigabyte RAM instead. I also have a self-implementing program-transformation tool. It consists of a recursive-descent parser generator, a tree-rewriting system, and an unparser. In principle, it needs garbage collection. In practise, well, I've said it before. Memories are large these days. -- hendrik> > — Gordon