Hi All, Bruno asked me to write up a short email about the direction we're going with llvm-mc and how it interacts with the object writer stuff. My hope was that .o writing could be basically done without worrying about llvm-mc, but I think we've reached the point where it is useful to talk about how the two will interact in the future. It turns out that I also failed in the goal of writing a *short* email, oh well. llvm-mc has many goals, but it basically revolves around generating full native assemblers and disassemblers from suitably enhanced versions of the .td files we already use for the code generator. For this email, I'll just focus on the native assembler part. If you look at writing out a .o file, there are a couple of interesting parts: 1) encoding machine instructions, 2) wrapping an ELF/PECOFF/MACHO container around them, 3) supporting inline assembly, and 4) wiring it into the LLVM backend. LLVM already handles #1 fairly well. It can certainly be improved, but the JIT works today. Aaron and Bruno are working on #2, integrating .o file writing into the backend but ignoring inline assembly. The integration into llvm (#4) is pretty ugly, because the .o file writers duplicate a ton of logic in the asmprinters. I intentionally asked Bruno to ignore inline assembly. My goal with llvm-mc is to solve all of these problems, pulling in Bruno and Aaron's work on #2 when the time is right. The ultimate design I'd like to get to looks like this: 1. Implement a full assembler, with a .s parser, a fragment processor, and .o file writer. The main data-structure the .o file writer will work on are "fragments" which (at the end of the assembler) are basically chunks of bytes with an associated section and a list of relocations (which is exactly what llvm::BinaryObject is!). 2. The full assembler should work as a drop in replacement for "as"/"gas". This allows easy testing, and may also be interesting to the BSD folks. I am slightly worried and paranoid about getting different output when using "llvm-gcc -S + as" and using "llvm-gcc - c", so testing is very important to me. 3. The assembler needs an API between the .s parser and the "assembler backend". This is called MCStreamer so far - it is somewhat similar to the "Actions" interface in clang. This separates the "parsing" from the "fragment construction", relaxation, and other stuff that an assembler does. 4. Since we have a nice MCStreamer API, we plan to implement it in two ways: one way will use the assembler backend to write out a .o file. The other way will just *prints out a .s file* that is semantically identical to the input. This is currently starting to limp along, and is a good way to bring up the asm parser without worrying about the "assembler backend" and .o file writing. 5. The LLVM "asmprinter" will change from writing out text directly with raw_ostream calls to making virtual method calls on the MCStreamer API. For example, instead of 'O << "\t.align 2\n";', the asmprinter will end up calling Streamer->EmitValueToAlignment(2). When run in "-S" mode, the implementation of the Streamer will be the .s file writer, which just writes out the .align directive. 6. When the "assembler backend" comes up, this means that we can drop it into the current "asmprinter" and the virtual methods (like EmitValueToAlignment) will just do the right thing for building a .o file. This also means that the compiler can pull in the asmparser to handle inline assembly with no problem. Taking this sort of approach has a number of advantages. A big problem that I see today is that the .s writer and .o writers use completely different code paths that make almost exactly the same decisions. For example, if separate, they both need to have some equivalent of the X86ATTAsmPrinter::printModuleLevelGV function, which decides what low-level linkage, visibility, and many other aspects of a global to emit. Duplicating all this logic is bad and is likely to lead to -c and -S doing different things. If the only difference between -c and -S is a different implementation of MCStreamer, and if we can independently test that the assembler "does the right thing", we should be in good shape for correctness. There is clearly some potential for overlap between what the MC stuff is doing and the .o file writing work is doing. Before I talk about that, I want to talk about the implementation plan for MC. The plan is to do X86-32 first then x86-64, then move on to ARM probably, with these steps (many of which are parallel): 1. Make a new llvm-mc test tool that is a driver for various components (done). 2. Build up an asm parser for x86 that can print out a semantically- identical .s file (one that can be assembled to the same bytes as the input). This "complicated cat" is just a testing mechanism, but it also allows implementation of the asm parser to proceed in parallel with other pieces, and will be used by "-S" eventually (in progress). 3. Implement support for tblgen-generating the parsing logic that maps from "opcode name + argument" in the asmparser to an instruction enum value + MCOperands. 4. Refactor the asm printer to go through MCStreamer instead of writing to raw_ostream directly (in progress). 5. Refactor the asm printer to not depend at all on libx86, codegen, vmcore etc. The goal is for x86/asmprinter to only depend (at a link level) on libsupport. There is a lot involved in this, to say the least, but this is important for "llvm-mc as a disassembler API". 6. Implement the "assembler middle end", which handles relaxation operations (e.g. determining whether something is a short or long branch on x86, figuring out how big a uleb has to be, etc), which generates the final fragments (which are basically going to be llvm::BinaryObject's). 7. Implement support for writing a .o file. As you can see, llvm-mc is a pretty ambitious and non-trivial project. Because of this, I don't expect to get to #7 for a couple months at least, and I understand that the .o file work needs to be almost complete by then (for GSoC etc). My goal was for the .o writing work to focus on stuff that doesn't intersect with MC stuff until late in the game: the encoding to ELF, writing DWARF as bytes instead of ".byte's" etc. My hope was that the .o file writer stuff could just be refactored directly into providing #7, completing the final step. Coming back to the short-term issue of "we need templates to make byte emission fast in the code generator", I hope that I've convinced you that this will "just not matter". Please focus on making .o file writing be done in a simple, clear and concise way without worry about micro-level performance optimizations. By the time we're done with this whole project, many things will have changed. Doing optimization of .o file writing at this stage is *way way way* too early. One nice thing about this design is that (like clang) it will be very easy to decompose the layers and performance test just (e.g.) the .o writer without testing the whole compiler's performance. -Chris