well, as it so seems I need to bother everyone on the list with my pointless newb crap, but here goes. maybe there was a FAQ for all this, but I missed it. well, I am not trying to demean LLVM in any way here, only trying to understand and evaluate things from my POV is all... sorry if at all I seem arrogant or condescending... well, running a linecounter, it is about 223 kloc (c++ ...), + 72 kloc headers. this is decently large. my compiler, in comparrision, is only about 62 kloc thus far, but I have about 300 kloc of 3D code in my main project, so ok, it balances I guess... ok, I also currently only do x86 (at present this is all I really need for my uses...). skimming, it does not look like LLVM uses its own assembler or linker?... (or at least, I can't seem to find anything looking like one in the codebase anywhere). I am assuming then that some external assembler is used (such as 'gas')?... actually, I don't really understand the code so well. I am a C-head (almost never using C++), and have very different organization and structuring practices (I learned much of my coding practice from things like the quake source and linux kernel, and I got into compilers and interpreters initially by at one point skimming over the guile source...), so it is hard to figure out just how the parts go together and how things work... internally, the compilers look *VERY* different (I am right now doubting there is much of anything similar...). looks like it is more intended to work as a static compiler, or for digging around in the compiler machinery?... it looks like much of the interconnection and data sharing is done through objects and templates?... (I usually gain modularity by use of abstraction, for example, several of the major components work by moving the data between the compiler stages serialized as large chunks of ascii text. as a result, some of my APIs end up looking more than a little like printf...) doesn't appear much like working like a dynamic compiler is a major design goal (so I hear, it can be used this way, but this is not the focus). so, it looks like the design focuses mostly of taking the input modules, grinding it and mixing it, and doing lots of spify inter-module optimizations (presumably forming a monolithic output representing the entire project?...). very different... mine is very different, I produce masses of object-modules (in the more traditional sense), which are linked at runtime, and kept in an in-memory largely but not completely linked form. dynamic relinking is a design goal in my case, so it is good to be able to unlink a module from the running image and/or relink a new module in its place (hopefully, if possible, without crashing the app in the process). as a result, my compiler generally refrains from inlining things or doing brittle inter-function optimizations (after all, one could potentially relink parts of the image and break things...). like, basically, I mostly wanted to be able to tweak out C code in much the same way as is possible with LISP (going beyond the confines of what is possible with compilers like gcc, or with existing interpreter-based scripting VMs, while trying to preserve both the speed and capability of a true compiler, with the flexibility and dynamic changability of a typical interpreter-based VM...). so, basic question: how well would LLVM work for being used in a manner comprable to LISP-style eval (or Self, Smalltalk, or Python style incremental restructuring)?... and incrementally replacing functions or modules at runtime?... or is the intention more like "compile once and use"?... would this likely imply redoing much of the compilation process for each incremental change?... or am I missing something major here?... as a result, the assembler and the linker form the core around which nearly my entire compiler framework is built (the assembler and linker core form the base platform, and the upper and lower compilers allow me to use it with something other than assembly and object files...). the lower compiler (RIL to ASM, 16.4 kloc) was originally intended to be a replacable component, but ended up being a little larger and more complicated than was hoped (so I fudged it, and largely implemented both x86 and x86-64 support in the same version). it may be eventually split into an upper-lower and lower-lower portions (upper-lower managing mostly the abstract machine and type mechanics, and lower-lower handling the codegen). the upper compiler (C to RIL), is similarly intended as replacable (an 'Embedded C++' frontend, for example, would likely exist by simply replacing the whole frontend...). actually, for technical reasons, I may eventually replace it anyways. the current upper-compiler was just sort of hacked-together mostly from pre-existing code I had laying around (mostly, it was derived from the compiler from the immediate predecessor, which was an interpreted, and later JIT-compiled scripting language...). so, my thought: the projects seem sufficiently different, that it seems to me thus far that about the only real similarity is that both are compilers... the goals are just doubtful the same as my goals is all... still, I will encourage the developers for this project to keep up the good work. this is still an interesting project, in any case. or such...
Gordon Henriksen
2007-Oct-23 15:45 UTC
[LLVMdev] me being stupid: me vs the llvm codebase...
On Oct 23, 2007, at 05:52, BGB wrote:> I am assuming then that some external assembler is used (such as > 'gas')?...In the static compilers, yes. The JIT directly serializes instructions into memory without the aid of an external assembler. There are also experimental built-in assemblers; LLVM calls them object writers[1].> it looks like much of the interconnection and data sharing is done > through objects and templates?...That's correct. The LLVM intermediate representation (IR) is well- suited for many transformations and analyses, which are generally structured as passes[2]. The LLVM IR has both object-oriented[3], textual (.ll) [4], and binary (.bc "bitcode") [5] representations; all are fully equivalent. However, it is more efficient not to wring the program through multiple print/parse or write/read cycles, so the object-oriented representation is generally maintained within any single process. The code generators also convert the program into the SelectionDAG and MachineFunction forms, both of which are target-independent in form but not in content.[6] Each of these forms have multiple states with differing invariants. (Strictly speaking, however, these forms are private to each code generator; the C backend does not use either.) These code generation forms do not have first-class textual or binary representations, since they are ephemeral data structures used only during code generation. They can however be dumped to human- readable text, or viewed with GraphVis.> doesn't appear much like working like a dynamic compiler is a major > design goal (so I hear, it can be used this way, but this is not > the focus). > > so, it looks like the design focuses mostly of taking the input > modules, grinding it and mixing it, and doing lots of spify inter- > module optimizations (presumably forming a monolithic output > representing the entire project?...).LLVM does work well as a static (offline) compiler, where inter- procedural optimization and link-time optimization are useful. In llvm-gcc, link-time optimization ("mixing" as you say) only occurs with at -O4. Typically, IPO is performed only within a single compilation unit (-O3/-O2). No IPO is performed at -O0.> as a result, my compiler generally refrains from inlining things or > doing brittle inter-function optimizations (after all, one could > potentially relink parts of the image and break things...).It's possible to use LLVM in the same manner by simply refraining from the use of inter-procedural optimizations. If LLVM bytecode is used as the on-disk representation, however, LLVM would allow the use of offline optimizations before starting the JIT program. This could include IPO or LTO at the developer's option, and would be entirely safe if the unit of dynamism were restricted to an LLVM module, since LTO merges modules together.> how well would LLVM work for being used in a manner comprable to > LISP-style eval (or Self, Smalltalk, or Python style incremental > restructuring)?...Simply codegen the string into a function at runtime, JIT it, and call it.[7] Afterwards, the IR and the machine code representation can be deleted.> and incrementally replacing functions or modules at runtime?...Generally speaking, LLVM neither helps nor hinders here. Maybe someone will follow up with whether the JIT uses stub functions which would enable dynamic relinking If not, it would be a straightforward, if platform-specific, feature to add. — Gordon [1] http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ ELFWriter.cpp?view=markup http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ MachOWriter.cpp?view=markup [2] http://llvm.org/docs/WritingAnLLVMPass.html [3] http://llvm.org/docs/ProgrammersManual.html#coreclasses [4] http://llvm.org/docs/LangRef.html [5] http://llvm.org/docs/BitCodeFormat.html [6] http://llvm.org/docs/CodeGenerator.html [7] watch this space, currently under rapid construction: http:// llvm.org/docs/tutorial/ In particular, observe the HandleTopLevelExpression function in §3.3 "Implementing Code Generation to LLVM IR." That function will be extended to handle the eval usage in §3.4 "Adding JIT and Optimizer Support."
----- Original Message ----- From: "Gordon Henriksen" <gordonhenriksen at mac.com> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> Sent: Wednesday, October 24, 2007 1:45 AM Subject: Re: [LLVMdev] me being stupid: me vs the llvm codebase... On Oct 23, 2007, at 05:52, BGB wrote:> I am assuming then that some external assembler is used (such as > 'gas')?...< In the static compilers, yes. The JIT directly serializes instructions into memory without the aid of an external assembler. There are also experimental built-in assemblers; LLVM calls them object writers[1]. > ok, ok. I just couldn't find them is all... yeah, I once started doing direct instruction serialization (aided by an 'assembler' that was infact a mass of functions using a procedural interface). eventually, I decided that this route was just too damn inconvinient, and so this was largely replaced by a printf-style interface (using a variation of NASM-style syntax, though with a few differences, such as allowing lumping multiple opcodes per line, ...). but, then, later on the internals of the codegen went partly back to such an interface, though mostly because of the need to abstract things (generating different possible sequences of instructions depending on context and so on, what could later become the basis of 'VAS'...).> it looks like much of the interconnection and data sharing is done > through objects and templates?...< That's correct. The LLVM intermediate representation (IR) is well- suited for many transformations and analyses, which are generally structured as passes[2]. The LLVM IR has both object-oriented[3], textual (.ll) [4], and binary (.bc "bitcode") [5] representations; all are fully equivalent. However, it is more efficient not to wring the program through multiple print/parse or write/read cycles, so the object-oriented representation is generally maintained within any single process. > yes, ok. I generally use whatever representation within the process, but many processes communicate textually. in part, this gives an easier way to inspect what is going on (I can look at a dump of the preprocessor output, parse trees, RIL code, or assembler and see how things are working, ...). in part, this also allows maintaining isolation between the processes, where many may represent data differently. serializing to text and parsing from text allow a good deal of abstraction (one part is unwinding and dumping its output informally into a big text buffer), while another part winds back up potentially building a very different representation of the data (for example, the upper compiler is based around trees, where the lower compiler is more based around buffers and flat arrays, and doing funky bit-twiddling to cram much of the typesystem mechanics into a single 32bit integer...). it is also a lot easier to make changes to the printer or parser, than to go through the horrid pain of changing around a bunch of structs and/or having to modify a bunch of code for seemingly trivial alterations... this was in fact a major reason for why I created my 'RPNIL' language... I had assumed different languages would target it, so I wanted something hopefully fairly general. different frontends could be written hopefully without too much interdependence or conflict. this is also why I had chosen a stack machine, as this offers at least some semblance of machine-abstraction. I am also fairly fammiliar with stack machines, as I have had a long time of fairly good success with them... a major problem I have ran into though is related to evaluation ordering, which though providing multiple options, doesn't really provide any great options. in part, the x86-64 calling convention leaves a major point of pain I have not yet decided how to work around... (I need to figure some good way to either reorder the code or reorder the data within the confines of an abstract stack model...). one option, though disliked as it would require changing RPNIL, would be to make each function argument be a postscript-style block, with an instruction to indicate that it is an argument. (ick...). also, originally I had assumed the RPNIL compiler would also be replacable, but I am starting to doubt this... the original idea I had when designing RPNIL was that by the time I got to this, I would implement a three-address-code version of the RPNIL compiler (thus making the stack almost purely an abstraction). but, my first, and currently only implementation, is a multi-pass linear processor (first pass, determines register usage and stack layout and so on, second pass generates code...). going to TAC may be what I do if I split the RPNIL compiler, where the upper half will convert RIL to 'VAS' (just comming up with a term here, like 'virtual assembler'), which would be more or less machine specific, but not yet worked through all the 'gritty details', such as the exact sequences of instructions used in representing these various operations (the VAS-stage would be mostly a thin translator working through the instruction-sequence details, and spitting out raw assembler). another previously considered option was compiling from RPNIL to LLVM (LLVM seems to come somewhere somewhat lower-level than RPNIL, but a bit higher-level than what my 'VAS' idea would probably be...). < The code generators also convert the program into the SelectionDAG and MachineFunction forms, both of which are target-independent in form but not in content.[6] Each of these forms have multiple states with differing invariants. (Strictly speaking, however, these forms are private to each code generator; the C backend does not use either.) These code generation forms do not have first-class textual or binary representations, since they are ephemeral data structures used only during code generation. They can however be dumped to human- readable text, or viewed with GraphVis. > ok.> doesn't appear much like working like a dynamic compiler is a major > design goal (so I hear, it can be used this way, but this is not > the focus). > > so, it looks like the design focuses mostly of taking the input > modules, grinding it and mixing it, and doing lots of spify inter- > module optimizations (presumably forming a monolithic output > representing the entire project?...).< LLVM does work well as a static (offline) compiler, where inter- procedural optimization and link-time optimization are useful. In llvm-gcc, link-time optimization ("mixing" as you say) only occurs with at -O4. Typically, IPO is performed only within a single compilation unit (-O3/-O2). No IPO is performed at -O0. > yes, ok. note however, that I don't even want to touch the gcc-frontend in my projects... the gcc codebase is a horror I would rather not have the misfortune of dealing with (much less trying to make it think it was something like lisp or python...).> as a result, my compiler generally refrains from inlining things or > doing brittle inter-function optimizations (after all, one could > potentially relink parts of the image and break things...).< It's possible to use LLVM in the same manner by simply refraining from the use of inter-procedural optimizations. > possibly, yes. < If LLVM bytecode is used as the on-disk representation, however, LLVM would allow the use of offline optimizations before starting the JIT program. This could include IPO or LTO at the developer's option, and would be entirely safe if the unit of dynamism were restricted to an LLVM module, since LTO merges modules together. > ok, dunno here.> how well would LLVM work for being used in a manner comprable to > LISP-style eval (or Self, Smalltalk, or Python style incremental > restructuring)?...< Simply codegen the string into a function at runtime, JIT it, and call it.[7] Afterwards, the IR and the machine code representation can be deleted. > ok. how well does this work if, say, we decide to override a globally defined function with a newly defined one?...> and incrementally replacing functions or modules at runtime?...< Generally speaking, LLVM neither helps nor hinders here. Maybe someone will follow up with whether the JIT uses stub functions which would enable dynamic relinking If not, it would be a straightforward, if platform-specific, feature to add. > I don't use proxy or stub functions, I relink them... basically, a system of tables is kept keeping track of where all the various functions and variables are located, where all they are used from, ... as a result, moving a function or variable causes the linker to go and 'unlink' the references (relocating by the inverse addr so that the references point back to NULL), and 'relinking' (which involves modifying the references to point to the new location). (in general, I don't relocate normal variables though, largely because this would tend to cause them to lose their value...). now, sadly, one has to be pretty hopeful another thread is not running in the code when this happens, which is the risk... likewise, all the modules are kept linked together in a kind of "heap" of sorts... I did it this way so that I can still use object modules and static libraries compiled with gcc in much the same way as code generated by my compiler... < — Gordon [1] http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ ELFWriter.cpp?view=markup http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ MachOWriter.cpp?view=markup [2] http://llvm.org/docs/WritingAnLLVMPass.html [3] http://llvm.org/docs/ProgrammersManual.html#coreclasses [4] http://llvm.org/docs/LangRef.html [5] http://llvm.org/docs/BitCodeFormat.html [6] http://llvm.org/docs/CodeGenerator.html [7] watch this space, currently under rapid construction: http:// llvm.org/docs/tutorial/ In particular, observe the HandleTopLevelExpression function in §3.3 "Implementing Code Generation to LLVM IR." That function will be extended to handle the eval usage in §3.4 "Adding JIT and Optimizer Support." _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>yes, ok.
Possibly Parallel Threads
- [LLVMdev] me being stupid: me vs the llvm codebase...
- [LLVMdev] me being stupid: me vs the llvm codebase...
- [LLVMdev] me being stupid: me vs the llvm codebase...
- [LLVMdev] me being stupid: me vs the llvm codebase...
- [LLVMdev] me being stupid: me vs the llvm codebase...