More questions about writing a high-level language:
Let's say we have a "headerless" language such as Java or C#,
where the
compiler is able to get high-level type information from the compiled
object file rather than via header files as used in C/C++.
So for example, if I have a module that imports symbols from some other
modules, the compiler would check to see if those modules need to be
recompiled. If so, those modules would be parsed and added to the queue
for compilation, otherwise it would simply parse the output file (which
presumably is faster than reparsing & recompiling the source).
What I'm envisioning is a compiler which converts each module into an
intermediate form, containing both the LLVM bitcode and a compressed
version of the high-level types. This includes inline methods and
templates, meaning that the contents of a module might affect the
compilation of modules that imports it. (Dealing with circular
dependencies will be interesting but not unsolvable.) Each compiled
module would also contain a list of other modules that it depends on,
including a hash of the imported module's content, so that it would be
relatively easy to calculate when a module needs to be rebuilt.
These intermediate forms are then combined with a linker to a native
binary, in which all of the high-level type information is stripped out
(except in debug builds).
So the first question I have is, do I need to come up with my own
container format for LLVM bitcode, or is there a way to store the type /
dependency information in the existing format? Or would it be better to
have two output files per source file, one with the LLVM bitcode, and
one with the high-level type information? This latter strategy would
allow the stock LLVM linker to be used to create the final application.
Also, any ideas or comments on the general subject would be welcome.
-- Talin