I managed to compile newlib with llvm-gcc yesterday. That is, the machine independent part is now basically done, and the syscall part contains no-op stubs provided by libgloss. I haven't tested the port yet, but since newlib has already been ported to many architectures, I would be pretty surprised if there were any major problems. A couple of things I noticed when configuring newlib for LLVM. First, I did not find any preprocessor symbols that I could use to identify that we are compiling to LLVM byte code. If there is one, I'd be happy to hear it, but if not, then it might be a good idea to define __LLVM__ or something like that in (by) llvm-gcc. Another related thing is that even when I defined -emit-llvm in what I thought would be a global CFLAGS for all of newlib, it did not get propagated to all subdirectories. I solved both of these issues by creating a shell script that is just a fall-through to llvm-gcc, but passes "-emit-llvm -D__LLVM__" to it. It might be worthwhile to have a similar thing in the LLVM distribution, that is, a compiler that would identify the target as LLVM and produce byte code by default. There was very little to do in terms of porting. Basically the only thing I needed to tweak in the source code was to define floating point endiness, which I randomly picked to be __IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my choice. The next task is to go for the system calls. As I said earlier, I plan to use intrinsic functions as place holders. Any opinions how to name them? Currently there are a few intrinsics that have to do with libc, like llvm.memcpy and llvm.memmove. However, I would personally prefer less pollution in the intrinsic name space, so I would propose naming the intrinsics with a llvm.libc prefix, e.g. llvm.libc.open and so forth. Any strong opinions on this? -- Pertti
Hi Pertti, On Thu, 2006-11-09 at 15:29 +0200, Pertti Kellomäki wrote:> I managed to compile newlib with llvm-gcc yesterday. That > is, the machine independent part is now basically done, and > the syscall part contains no-op stubs provided by libgloss. > I haven't tested the port yet, but since newlib has already > been ported to many architectures, I would be pretty surprised > if there were any major problems.Very nice.> A couple of things I noticed when configuring newlib for LLVM. > First, I did not find any preprocessor symbols that I could use > to identify that we are compiling to LLVM byte code. If there is > one, I'd be happy to hear it, but if not, then it might be a good > idea to define __LLVM__ or something like that in (by) llvm-gcc.That's a good idea, especially for inline ASM things.> Another related thing is that even when I defined -emit-llvm in > what I thought would be a global CFLAGS for all of newlib, it did > not get propagated to all subdirectories.Oh? Which ones did it not get propagated to?> > I solved both of these issues by creating a shell script that is > just a fall-through to llvm-gcc, but passes "-emit-llvm -D__LLVM__" > to it. It might be worthwhile to have a similar thing in the LLVM > distribution, that is, a compiler that would identify the target as > LLVM and produce byte code by default. > > There was very little to do in terms of porting. Basically > the only thing I needed to tweak in the source code was to define > floating point endiness, which I randomly picked to be > __IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my > choice.I would think that it would follow the endianness of the host platform, but someone else might have a more definitive answer.> > The next task is to go for the system calls. As I said earlier, > I plan to use intrinsic functions as place holders.Why? You should be able to compile any assembly code there using LLVM's inline assembly feature. It is already good enough for compiling (most of) Linux's inline assembly.> Any opinions > how to name them?I don't think it's appropriate to use intrinsics for this. What is the reason you think you need intrinsics for the system calls?> Currently there are a few intrinsics that have > to do with libc, like llvm.memcpy and llvm.memmove. However, I > would personally prefer less pollution in the intrinsic name space, > so I would propose naming the intrinsics with a llvm.libc prefix, > e.g. llvm.libc.open and so forth. Any strong opinions on this?Yes, it should be completely unnecessary to use intrinsics at all unless there is a good optimization reason. The intrinsics we have are either lowered generically (e.g. llvm.bswap becomes a series of shifts) or lowered by the various targets into appropriate code for that target. However, there shouldn't be any reason to implement the system calls this way. Again, what issue are you trying to overcome that you think intrinsics is the solution? Reid.
On 11/9/06, Pertti Kellomäki <pk at cs.tut.fi> wrote:> The next task is to go for the system calls. As I said earlier, > I plan to use intrinsic functions as place holders. Any opinions > how to name them? Currently there are a few intrinsics that have > to do with libc, like llvm.memcpy and llvm.memmove. However, I > would personally prefer less pollution in the intrinsic name space, > so I would propose naming the intrinsics with a llvm.libc prefix, > e.g. llvm.libc.open and so forth. Any strong opinions on this?There have been syscall intrinsic patches floating around in the past, but the prevailing opinion right now is that this is a matter best handled for inline assembly. I would send you my old syscall intrinsic patch, but it is out of date with respect to both codegeneration and how one does intrinsics. Andrew
Pertti Kellomäki wrote:> I managed to compile newlib with llvm-gcc yesterday. That > is, the machine independent part is now basically done, and > the syscall part contains no-op stubs provided by libgloss. > I haven't tested the port yet, but since newlib has already > been ported to many architectures, I would be pretty surprised > if there were any major problems. > > A couple of things I noticed when configuring newlib for LLVM. > First, I did not find any preprocessor symbols that I could use > to identify that we are compiling to LLVM byte code. If there is > one, I'd be happy to hear it, but if not, then it might be a good > idea to define __LLVM__ or something like that in (by) llvm-gcc. > Another related thing is that even when I defined -emit-llvm in > what I thought would be a global CFLAGS for all of newlib, it did > not get propagated to all subdirectories. > > I solved both of these issues by creating a shell script that is > just a fall-through to llvm-gcc, but passes "-emit-llvm -D__LLVM__" > to it. It might be worthwhile to have a similar thing in the LLVM > distribution, that is, a compiler that would identify the target as > LLVM and produce byte code by default. > > There was very little to do in terms of porting. Basically > the only thing I needed to tweak in the source code was to define > floating point endiness, which I randomly picked to be > __IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my > choice. > > The next task is to go for the system calls. As I said earlier, > I plan to use intrinsic functions as place holders. Any opinions > how to name them? Currently there are a few intrinsics that have > to do with libc, like llvm.memcpy and llvm.memmove. However, I > would personally prefer less pollution in the intrinsic name space, > so I would propose naming the intrinsics with a llvm.libc prefix, > e.g. llvm.libc.open and so forth. Any strong opinions on this? >I agree with Reid; you should only need an intrinsic if you need to inline the system call trapping code or want a singular function name for system calls when performing analysis. Otherwise, the system call functions (open(), read(), etc) can be implemented in a native code run-time library. In the LLVA-OS project, we designed an intrinsic called llva_syscall (in LLVM, it would be llvm.syscall()) that takes a system call number and a set of parameters and calls that system call number with those parameters. It's a slightly higher level trap instruction that encapsulates most of the OS system call calling conventions. All of the system calls (open(), read(), etc) are just library function wrappers around llva_syscall() that provide the right system call number and re-arrange the input parameters if necessary. However, you'll notice that we've never implemented it in the LLVM code generators. That's because there's no need to do so unless you want to have the system call trapping instruction inlined and you can't use the LLVM C backend for code generation (i.e. llc -march=c). What we have done is to implement the llva_syscall() "intrinsic" as an external function at the LLVM bytecode level. After code generation, we can then link in a native code library that defines llva_syscall(). Furthermore, if using the C backend, we can define llva_syscall() in a header file and #include it into the program using GCC's -include option. This allows the llva_syscall() function to be inlined where appropriate. I have an implementation of the x86/Linux llva_syscall() header file that I can give you, if you need it. I also have a prototype library, libsys, which implements all of the Linux system calls as calls to llva_syscall(). It's (mostly) right for Linux 2.4. <shameless plug> More information on the llva_syscall() instruction can be found in our paper at http://llvm.org/pubs/2006-06-18-WIOSCA-LLVAOS.pdf in section III.F. </shameless plug> Regards, -- John T.
Hi Reid, I'll write a separate post about the intrinsics, but just a quick note about the CFLAGS issue. Reid Spencer kirjoitti:> On Thu, 2006-11-09 at 15:29 +0200, Pertti Kellomäki wrote: >> Another related thing is that even when I defined -emit-llvm in >> what I thought would be a global CFLAGS for all of newlib, it did >> not get propagated to all subdirectories.>> Oh? Which ones did it not get propagated to?I did not see it being propagated to libgloss, but maybe I was trying to define the flags at the wrong place. Since the llvm-gcc shell script solved my immediate problem, I did not bother looking any further. -- Pertti
This is in response to Reid's and John's comments about intrinsics. The setting of the work is a project on reconfigurable processors using the Transport Triggered Architecture (TTA) <http://en.wikipedia.org/wiki/Transport_triggered_architecture>. For the compiler this means that the target architecture is not fixed, but rather an instance of a processor template. Different instances of the template can vary in the mix of function units and their connectivity. In addition to the source files, the compiler takes a processor description as input. In practical terms this means that there is not much point in keeping native libraries around, as the processor instances are not compatible with each other. There is also no operating system to make calls to. I/O is done by fiddling bits in function units. The plan is to use LLVM as a front end, and write a back end that maps LLVM byte code to the target architecture. One of the main issues is instruction scheduling, in order to utilize the instruction level parallelism that TTAs potentially provide. Much of libc is just convenience functions expressible in plain C, so my plan is to compile newlib to byte code libraries, which would be linked with the application at the byte code level. The linked byte code would then be passed to the back end and mapped to the final target. The only issue is how to deal with system calls. The idea of using intrinsic functions for them comes from the way memcpy etc. are currently handled in LLVM. At LLVM byte code level, the libraries would contain calls to the intrinsic functions in appropriate places, and upon encountering them the back end would generate the corresponding code for the target. If there are better options, I'm all ears. I have not committed a single line of code yet, so design changes are very easy to do ;-) We do have a linker for the target architecture, so I suppose it would be possible to leave calls to the library functions involving I/O unresolved at the byte code level, and link those functions in at the target level. At a first glance intrinsics seem to be less hassle, but I could well be wrong. In practice I/O will probably boil down to reading a byte and writing a byte, mainly for debugging purposes. My understanding is that the real I/O will take place via dual port memories, DMA, or some other mechanism outside of libc. -- Pertti
Chris Lattner kirjoitti:> llvm-gcc defines __llvm__.Thanks. I thought I tried it, but apparently not. -- Pertti
On Thu, 9 Nov 2006, [ISO-8859-1] Pertti Kellom�ki wrote:> to identify that we are compiling to LLVM byte code. If there is > one, I'd be happy to hear it, but if not, then it might be a good > idea to define __LLVM__ or something like that in (by) llvm-gcc.llvm-gcc defines __llvm__. -Chris -- http://nondot.org/sabre/ http://llvm.org/
On Thu, 9 Nov 2006, Reid Spencer wrote:>> Currently there are a few intrinsics that have >> to do with libc, like llvm.memcpy and llvm.memmove. However, I >> would personally prefer less pollution in the intrinsic name space, >> so I would propose naming the intrinsics with a llvm.libc prefix, >> e.g. llvm.libc.open and so forth. Any strong opinions on this? > > Yes, it should be completely unnecessary to use intrinsics at all unless > there is a good optimization reason. The intrinsics we have are either > lowered generically (e.g. llvm.bswap becomes a series of shifts) or > lowered by the various targets into appropriate code for that target. > However, there shouldn't be any reason to implement the system calls > this way. Again, what issue are you trying to overcome that you think > intrinsics is the solution?As a specific example, compiling "printf" to llvm .bc form is very useful. However, printf ends up calling "write" at some point, which is a syscall. There isn't any really good reason to have an llvm intrinsic for write, just leave 'write' as an external function. -Chris -- http://nondot.org/sabre/ http://llvm.org/
On Thu, 9 Nov 2006, Markus F.X.J. Oberhumer wrote:> > llvm-gcc defines __llvm__. > Could we add some more detailed version information to the frontend, > e.g. such as a predefined -D__llvm_bytecode_version__=6 ?Why do you need this? -Chris -- http://nondot.org/sabre/ http://llvm.org/