thr3ads.net - llvm dev - [LLVMdev] LLVM and newlib progress [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Pertti Kellomäki

2006-Nov-09 13:29 UTC

[LLVMdev] LLVM and newlib progress

I managed to compile newlib with llvm-gcc yesterday. That
is, the machine independent part is now basically done, and
the syscall part contains no-op stubs provided by libgloss.
I haven't tested the port yet, but since newlib has already
been ported to many architectures, I would be pretty surprised
if there were any major problems.

A couple of things I noticed when configuring newlib for LLVM.
First, I did not find any preprocessor symbols that I could use
to identify that we are compiling to LLVM byte code. If there is
one, I'd be happy to hear it, but if not, then it might be a good
idea to define __LLVM__ or something like that in (by) llvm-gcc.
Another related thing is that even when I defined -emit-llvm in
what I thought would be a global CFLAGS for all of newlib, it did
not get  propagated to all subdirectories.

I solved both of these  issues by creating a shell script that is
just a fall-through to  llvm-gcc, but passes "-emit-llvm -D__LLVM__"
to it. It might be worthwhile to have a similar thing in the LLVM
distribution, that is, a compiler that would identify the target as
LLVM and produce byte code by default.

There was very little to do in terms of porting. Basically
the only thing I needed to tweak in the source code was to define
floating point endiness, which I randomly picked to be
__IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my
choice.

The next task is to go for the system calls. As I said earlier,
I plan to use intrinsic functions as place holders. Any opinions
how to name them? Currently there are a few intrinsics that have
to do with libc, like llvm.memcpy and llvm.memmove. However, I
would personally prefer less pollution in the intrinsic name space,
so I would propose naming the intrinsics with a llvm.libc prefix,
e.g. llvm.libc.open and so forth. Any strong opinions on this?
-- 
Pertti

Reid Spencer

2006-Nov-09 15:34 UTC

head link

[LLVMdev] LLVM and newlib progress

Hi Pertti,

On Thu, 2006-11-09 at 15:29 +0200, Pertti Kellomäki
wrote:> I managed to compile newlib with llvm-gcc yesterday. That
> is, the machine independent part is now basically done, and
> the syscall part contains no-op stubs provided by libgloss.
> I haven't tested the port yet, but since newlib has already
> been ported to many architectures, I would be pretty surprised
> if there were any major problems.
Very nice.
> A couple of things I noticed when configuring newlib for LLVM.
> First, I did not find any preprocessor symbols that I could use
> to identify that we are compiling to LLVM byte code. If there is
> one, I'd be happy to hear it, but if not, then it might be a good
> idea to define __LLVM__ or something like that in (by) llvm-gcc.
That's a good idea, especially for inline ASM things.
> Another related thing is that even when I defined -emit-llvm in
> what I thought would be a global CFLAGS for all of newlib, it did
> not get  propagated to all subdirectories.
Oh? Which ones did it not get propagated to?
> 
> I solved both of these  issues by creating a shell script that is
> just a fall-through to  llvm-gcc, but passes "-emit-llvm
-D__LLVM__"
> to it. It might be worthwhile to have a similar thing in the LLVM
> distribution, that is, a compiler that would identify the target as
> LLVM and produce byte code by default.
> 
> There was very little to do in terms of porting. Basically
> the only thing I needed to tweak in the source code was to define
> floating point endiness, which I randomly picked to be
> __IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my
> choice.
I would think that it would follow the endianness of the host platform,
but someone else might have a more definitive answer.
> 
> The next task is to go for the system calls. As I said earlier,
> I plan to use intrinsic functions as place holders. 
Why? You should be able to compile any assembly code there using LLVM's
inline assembly feature. It is already good enough for compiling (most
of) Linux's inline assembly.
> Any opinions
> how to name them? 
I don't think it's appropriate to use intrinsics for this.  What is the
reason you think you need intrinsics for the system calls?
> Currently there are a few intrinsics that have
> to do with libc, like llvm.memcpy and llvm.memmove. However, I
> would personally prefer less pollution in the intrinsic name space,
> so I would propose naming the intrinsics with a llvm.libc prefix,
> e.g. llvm.libc.open and so forth. Any strong opinions on this?
Yes, it should be completely unnecessary to use intrinsics at all unless
there is a good optimization reason. The intrinsics we have are either
lowered generically (e.g. llvm.bswap becomes a series of shifts) or
lowered by the various targets into appropriate code for that target.
However, there shouldn't be any reason to implement the system calls
this way. Again, what issue are you trying to overcome that you think
intrinsics is the solution?

Reid.

Andrew Lenharth

2006-Nov-09 15:35 UTC

head link

[LLVMdev] LLVM and newlib progress

On 11/9/06, Pertti Kellomäki <pk at cs.tut.fi>
wrote:> The next task is to go for the system calls. As I said earlier,
> I plan to use intrinsic functions as place holders. Any opinions
> how to name them? Currently there are a few intrinsics that have
> to do with libc, like llvm.memcpy and llvm.memmove. However, I
> would personally prefer less pollution in the intrinsic name space,
> so I would propose naming the intrinsics with a llvm.libc prefix,
> e.g. llvm.libc.open and so forth. Any strong opinions on this?
There have been syscall intrinsic patches floating around in the past,
but the prevailing opinion right now is that this is a matter best
handled for inline assembly.  I would send you my old syscall
intrinsic patch, but it is out of date with respect to both
codegeneration and how one does intrinsics.

Andrew

John Criswell

2006-Nov-09 15:51 UTC

head link

[LLVMdev] LLVM and newlib progress

Pertti Kellomäki wrote:> I managed to compile newlib with llvm-gcc yesterday. That
> is, the machine independent part is now basically done, and
> the syscall part contains no-op stubs provided by libgloss.
> I haven't tested the port yet, but since newlib has already
> been ported to many architectures, I would be pretty surprised
> if there were any major problems.
>
> A couple of things I noticed when configuring newlib for LLVM.
> First, I did not find any preprocessor symbols that I could use
> to identify that we are compiling to LLVM byte code. If there is
> one, I'd be happy to hear it, but if not, then it might be a good
> idea to define __LLVM__ or something like that in (by) llvm-gcc.
> Another related thing is that even when I defined -emit-llvm in
> what I thought would be a global CFLAGS for all of newlib, it did
> not get  propagated to all subdirectories.
>
> I solved both of these  issues by creating a shell script that is
> just a fall-through to  llvm-gcc, but passes "-emit-llvm
-D__LLVM__"
> to it. It might be worthwhile to have a similar thing in the LLVM
> distribution, that is, a compiler that would identify the target as
> LLVM and produce byte code by default.
>
> There was very little to do in terms of porting. Basically
> the only thing I needed to tweak in the source code was to define
> floating point endiness, which I randomly picked to be
> __IEEE_BIG_ENDIAN. Hopefully someone can confirm or correct my
> choice.
>
> The next task is to go for the system calls. As I said earlier,
> I plan to use intrinsic functions as place holders. Any opinions
> how to name them? Currently there are a few intrinsics that have
> to do with libc, like llvm.memcpy and llvm.memmove. However, I
> would personally prefer less pollution in the intrinsic name space,
> so I would propose naming the intrinsics with a llvm.libc prefix,
> e.g. llvm.libc.open and so forth. Any strong opinions on this?
>   
I agree with Reid; you should only need an intrinsic if you need to
inline the system call trapping code or want a singular function name
for system calls when performing analysis.  Otherwise, the system call
functions (open(), read(), etc) can be implemented in a native code
run-time library.

In the LLVA-OS project, we designed an intrinsic called llva_syscall (in
LLVM, it would be llvm.syscall()) that takes a system call number and a
set of parameters and calls that system call number with those
parameters.  It's a slightly higher level trap instruction that
encapsulates most of the OS system call calling conventions.  All of the
system calls (open(), read(), etc) are just library function wrappers
around llva_syscall() that provide the right system call number and
re-arrange the input parameters if necessary.

However, you'll notice that we've never implemented it in the LLVM code
generators.  That's because there's no need to do so unless you want to
have the system call trapping instruction inlined and you can't use the
LLVM C backend for code generation (i.e. llc -march=c).

What we have done is to implement the llva_syscall() "intrinsic" as an
external function at the LLVM bytecode level.  After code generation, we
can then link in a native code library that defines llva_syscall(). 
Furthermore, if using the C backend, we can define llva_syscall() in a
header file and #include it into the program using GCC's -include
option.  This allows the llva_syscall() function to be inlined where
appropriate.

I have an implementation of the x86/Linux llva_syscall() header file
that I can give you, if you need it.  I also have a prototype library,
libsys, which implements all of the Linux system calls as calls to
llva_syscall().  It's (mostly) right for Linux 2.4.

<shameless plug>
More information on the llva_syscall() instruction can be found in our
paper at http://llvm.org/pubs/2006-06-18-WIOSCA-LLVAOS.pdf in section III.F.
</shameless plug>

Regards,

-- John T.

Pertti Kellomäki

2006-Nov-09 16:41 UTC

head link

[LLVMdev] LLVM and newlib progress

Hi Reid,

I'll write a separate post about the intrinsics, but just
a quick note about the CFLAGS issue.

Reid Spencer kirjoitti:> On Thu, 2006-11-09 at 15:29 +0200, Pertti Kellomäki wrote:
>> Another related thing is that even when I defined -emit-llvm in
>> what I thought would be a global CFLAGS for all of newlib, it did
>> not get  propagated to all subdirectories.
 >> Oh? Which ones did it not get propagated to?
I did not see it being propagated to libgloss, but maybe I
was trying to define the flags at the wrong place. Since the
llvm-gcc shell script solved my immediate problem, I did not
bother looking any further.
-- 
Pertti

Pertti Kellomäki

2006-Nov-09 18:35 UTC

head link

[LLVMdev] LLVM and newlib progress

This is in response to Reid's and John's comments about
intrinsics.

The setting of the work is a project on reconfigurable
processors using the Transport Triggered Architecture (TTA)
<http://en.wikipedia.org/wiki/Transport_triggered_architecture>.
For the compiler this means that the target architecture
is not fixed, but rather an instance of a processor template.
Different instances of the template can vary in the mix of
function units and their connectivity. In addition to the
source files, the compiler takes a processor description
as input.

In practical terms this means that there is not much point
in keeping native libraries around, as the processor instances
are not compatible with each other. There is also no operating
system to make calls to. I/O is done by fiddling bits in
function units.

The plan is to use LLVM as a front end, and write a back
end that maps LLVM byte code to the target architecture.
One of the main issues is instruction scheduling, in order to
utilize the instruction level parallelism that TTAs potentially
provide.

Much of libc is just convenience functions expressible in plain C,
so my plan is to compile newlib to byte code libraries, which
would be linked with the application at the byte code level.
The linked byte code would then be passed to the back end and
mapped to the final target.

The only issue is how to deal with system calls. The idea
of using intrinsic functions for them comes from the way
memcpy etc. are currently handled in LLVM. At LLVM byte code level,
the libraries would contain calls to the intrinsic functions in
appropriate places, and upon encountering them the back end
would generate the corresponding code for the target.

If there are better options, I'm all ears. I have not committed
a single line of code yet, so design changes are very easy to do ;-)
We do have a linker for the target architecture, so I suppose
it would be possible to leave calls to the library functions
involving I/O unresolved at the byte code level, and link those
functions in at the target level. At a first glance intrinsics
seem to be less hassle, but I could well be wrong.

In practice I/O will probably boil down to reading a byte and writing
a byte, mainly for debugging purposes. My understanding is that
the real I/O will take place via dual port memories, DMA, or
some other mechanism outside of libc.
-- 
Pertti

Pertti Kellomäki

2006-Nov-09 18:37 UTC

head link

[LLVMdev] LLVM and newlib progress

Chris Lattner kirjoitti:> llvm-gcc defines __llvm__.
Thanks. I thought I tried it, but apparently not.
-- 
Pertti

Chris Lattner

2006-Nov-09 19:45 UTC

head link

[LLVMdev] LLVM and newlib progress

On Thu, 9 Nov 2006, [ISO-8859-1] Pertti Kellom�ki wrote:> to identify that we are compiling to LLVM byte code. If there is
> one, I'd be happy to hear it, but if not, then it might be a good
> idea to define __LLVM__ or something like that in (by) llvm-gcc.
llvm-gcc defines __llvm__.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Chris Lattner

2006-Nov-09 19:47 UTC

head link

[LLVMdev] LLVM and newlib progress

On Thu, 9 Nov 2006, Reid Spencer wrote:>> Currently there are a few intrinsics that have
>> to do with libc, like llvm.memcpy and llvm.memmove. However, I
>> would personally prefer less pollution in the intrinsic name space,
>> so I would propose naming the intrinsics with a llvm.libc prefix,
>> e.g. llvm.libc.open and so forth. Any strong opinions on this?
>
> Yes, it should be completely unnecessary to use intrinsics at all unless
> there is a good optimization reason. The intrinsics we have are either
> lowered generically (e.g. llvm.bswap becomes a series of shifts) or
> lowered by the various targets into appropriate code for that target.
> However, there shouldn't be any reason to implement the system calls
> this way. Again, what issue are you trying to overcome that you think
> intrinsics is the solution?
As a specific example, compiling "printf" to llvm .bc form is very
useful.
However, printf ends up calling "write" at some point, which is a
syscall.
There isn't any really good reason to have an llvm intrinsic for write, 
just leave 'write' as an external function.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Chris Lattner

2006-Nov-09 20:22 UTC

head link

[LLVMdev] LLVM and newlib progress

On Thu, 9 Nov 2006, Markus F.X.J. Oberhumer wrote:> > llvm-gcc defines __llvm__.
> Could we add some more detailed version information to the frontend,
> e.g. such as a predefined -D__llvm_bytecode_version__=6 ?
Why do you need this?

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Nov 2006 - [LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

[LLVMdev] LLVM and newlib progress

Apparently Analagous Threads