In order to get to the next stage with LLVM (like compiling a kernel) we need to allow "pass through" of inline assembly so things like device drivers, interrupt vectors, etc. can be written. While this feature breaks the "pure" LLVM IR, I don't see any way around it. So, I thought I'd bring it up here so we can discuss potential implementations. I think we should take the "shoot yourself in the foot approach". That is, we add an instruction type to LLVM that simply encapsulates an assembly language statement. This instruction type is just simply ignored (but retained) by all the optimization passes. When code generation happens, the inline assembly is just blindly put out and if the programmer has shot himself in the foot, so be it. One other thing we can do that *might* be useful. If a function contains only inline assembly instructions, we could circumvent the usual calling conventions for that function. Thoughts? Reid. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040913/caf22927/attachment.sig>
Reid Spencer wrote:> In order to get to the next stage with LLVM (like compiling a kernel) we > need to allow "pass through" of inline assembly so things like device > drivers, interrupt vectors, etc. can be written. While this feature > breaks the "pure" LLVM IR, I don't see any way around it.<shameless plug> Actually, there should be a way around it. I'm currently working on extensions to LLVM for operating system support. You wouldn't be able to take the stock i386 Linux kernel and compile it, but you could write an operating system that would be completely compilable by LLVM (once I finish, that is). Currently, I'm modifying the Linux kernel to use LLVM intrinsics instead of inline asm. Currently, the intrinsics are simply library routines linked into the kernel, but someday (if all goes according to plan) they will become LLVM intrinsics. </shameless plug> <technical aside> The difficult part of an OS is not actually all the funky hardware stuff. The intrinsics for those are actually very straightforward and easy to implement. I/O, for example, is really volatile loads and stores with MEMBAR's. Registering interrupt handlers takes some very straitforward intrinsics. The I/O intrinsics are already implemented for LLVM in the x86 code generator (minus the FENCE/MEMBAR instructions). The difficult part is the code of the OS that changes native hardware state. The kernel's code for changing the program counter to execute a signal handler, or the code in fork() that sets up the new process to return zero when it begins running for the first time: these are the hard parts, because native i386 state is visible in LLVM programs (more accurately; for our research, we don't want it visibile). </technical aside>> > So, I thought I'd bring it up here so we can discuss potential > implementations. I think we should take the "shoot yourself in the foot > approach". That is, we add an instruction type to LLVM that simply > encapsulates an assembly language statement. This instruction type is > just simply ignored (but retained) by all the optimization passes. When > code generation happens, the inline assembly is just blindly put out and > if the programmer has shot himself in the foot, so be it.Question: Do you want inline asm to be able to compile programs out of the box? Or do you want it so that we can use native hardware features that we can't use now? For the former, we need inline i386/sparc/whatever support. For the latter, LLVM intrinsics should do the trick, and do it rather portably. The approach you suggest might work, although the code generator will need to know not to tromp on your registers, I guess. The bigger problem is GCC. GCC provides extended inline asm stuff that will probably be painful to pass from GCC to LLVM (and Linux, BTW, uses this feature a lot). Another thought: My impression is that inline assembly bites us a lot not because it's used a lot but because the LLVM compiler enables #defines for the i386 platform that we don't support. I think a lot of code has the following: #ifdef _i386 inline asm #else slow C code #endif The LLVM GCC compiler still defines _i386 (or its equivalent), so configure and llvm-gcc end up trying to compile inline assembly code when they don't really need to. I have to admit that this is an impression and not something I know for sure, but it seems reasonable that many application programs use i386 assembly because i386 is the most common platform, and speedups on it are good. Changing llvm-gcc to disable the _i386-like macros might make compilation of userspace programs easier. So, summary: o If you just want access to native hardware, the intrinsics I'm developing will be much cleaner than inline asm support (and portable too). o If you want inline asm to compile programs out of the box, it'll be more painful than what you've described. o Changing llvm-gcc so that it doesn't look like an i386 compiler might make it easier to compile applications with optional inline asm. Sorry if this is a bit rantish; my thoughts on the matter are not well organized.> > One other thing we can do that *might* be useful. If a function contains > only inline assembly instructions, we could circumvent the usual calling > conventions for that function. > > Thoughts? > > Reid. > > > ------------------------------------------------------------------------ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev-- John T. -- ********************************************************************* * John T. Criswell Email: criswell at uiuc.edu * * Research Programmer * * University of Illinois at Urbana-Champaign * * * * "It's today!" said Piglet. "My favorite day," said Pooh. * *********************************************************************
On Mon, 13 Sep 2004 11:40:48 -0500 John Criswell <criswell at cs.uiuc.edu> wrote:> Reid Spencer wrote: > > In order to get to the next stage with LLVM (like compiling a kernel) we > > need to allow "pass through" of inline assembly so things like device > > drivers, interrupt vectors, etc. can be written. While this feature > > breaks the "pure" LLVM IR, I don't see any way around it. > > The approach you suggest might work, although the code generator will > need to know not to tromp on your registers, I guess.It's worse than just knowing what registers are used by inlined assembler. You want the inline assembler to be able to reference local and global variables and function arguments. Plus, you have to be able to handle transfers of control inside the inlined assembler, such as a return, a branch to a label defined outside of the inlined assembler, or even calls to other functions (to properly handle inter-procedural optimzation). It can get quite messy. It will be a lot of work to do it as well as gcc or Microsoft's compiler.
On Mon, 13 Sep 2004, John Criswell wrote:> Actually, there should be a way around it. I'm currently working on > extensions to LLVM for operating system support. You wouldn't be able > to take the stock i386 Linux kernel and compile it, but you could write > an operating system that would be completely compilable by LLVM (once I > finish, that is).Being able to use intrinsics is definitely good, but it's not sufficient. There will always be things we don't cover, and inline asm will be required. In any case, compiling programs off the shelf certainly does require inline asm support, so we do need it regardless of what intrinsics we have.> The difficult part is the code of the OS that changes native hardware > state. The kernel's code for changing the program counter to execute a > signal handler, or the code in fork() that sets up the new process to > return zero when it begins running for the first time: these are the > hard parts, because native i386 state is visible in LLVM programs (more > accurately; for our research, we don't want it visibile).Some things really do want to be written in inline asm, and those things are obviously non-portable. This is not a problem, the goal of LLVM isn't to turn every non-portable program into a portable one :)> The bigger problem is GCC. GCC provides extended inline asm stuff that > will probably be painful to pass from GCC to LLVM (and Linux, BTW, uses > this feature a lot).Actually, the inline asm support provided by GCC is quite well thought out and makes a lot of sense (inline asms are required to define their side effects in target-independent terms). The big complaint that I have is it's incredibly baroque syntax. Eventually we should also support other forms of inline asm by translating them into the LLVM inline asm format, but keeping the inline asm format symantically equivalent to the GCC format is basically what we want.> My impression is that inline assembly bites us a lot not because it's > used a lot but because the LLVM compiler enables #defines for the i386 > platform that we don't support.We should aspire to be as compatible with GCC as reasonable, and including inline asm support is a big piece of that. In terms of implementation, adding inline asm support is just a "small matter of implementation": it shouldn't cause any fundamental problems with the llvm design. In particular, LLVM should get an "asm" Instruction, which takes a blob of text and some arguments. The big missing feature in LLVM is multiple return value support, which is required by asms that define multiple registers. My notes on multiple ret values are here if anyone is interested: http://nondot.org/sabre/LLVMNotes/MultipleReturnValues.txt -Chris -- http://llvm.org/ http://nondot.org/sabre/
Andrew Lenharth
2004-Sep-17 04:24 UTC
[LLVMdev] Inline Assembly (unique arch string for llvm)
On Mon, 2004-09-13 at 11:40, John Criswell wrote:> My impression is that inline assembly bites us a lot not because it's > used a lot but because the LLVM compiler enables #defines for the i386 > platform that we don't support. > > I think a lot of code has the following: > > #ifdef _i386 > inline asm > #else > slow C code > #endif > > The LLVM GCC compiler still defines _i386 (or its equivalent), so > configure and llvm-gcc end up trying to compile inline assembly code > when they don't really need to. > > I have to admit that this is an impression and not something I know for > sure, but it seems reasonable that many application programs use i386 > assembly because i386 is the most common platform, and speedups on it > are good. > > Changing llvm-gcc to disable the _i386-like macros might make > compilation of userspace programs easier.When I was working on porting glibc (currently being held up by a C99 support bug) the most straight forward approach was to define a new architecture string and implement a new target in glibc based on that machine string. So I propose that llvm-gcc not consider itself any type of x86-linux (or what ever it platform it was compiled on), but rather create a new architecture, say llvm (or perhaps 2, one for each bit and little endian). Thuse llvm-gcc -dumpmachine would return llvm-os. This would make system library (and OS kernel!) ports easier to maintain since arch llvm would be supported by adding stuff rather than changing stuff, and all the inline asm for known archs would go away and the C version would be used. In most cases the config scripts should consider compiling with llvm on a host as a cross compile from host arch to arch llvm. Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040916/017f74ff/attachment.sig>