thr3ads.net - llvm dev - [LLVMdev] Compiler Driver Decisions [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Reid Spencer

2004-Jul-30 18:04 UTC

[LLVMdev] Compiler Driver Decisions

LLVMers,

Since there's been little feedback on the design document I sent out, 
some decisions are being made in order to progress the work. If you have 
strong feelings about any of these, voice them now!

1. Name = llvmcc
2. The config file format will resemble Microsoft .ini files
    (name=value in sections)
3. -O set of options will control what gets done and what kind of output
    is generated.

I'm going to start documenting the design and usage of llvmcc in a new 
HTML document in the docs directory. You can comment on the design 
forther from the commits, if you like.

Reid.

John Criswell

2004-Aug-04 15:24 UTC

head link

[LLVMdev] Compiler Driver Decisions

Dear All,

I thought I would chime in with some ideas and opinions:

o Configuration Files

If it isn't too much trouble, I think we should go with XML for the 
following reasons:

1) We wouldn't need to implement a parsing library.  There are several 
XML parsing libraries available, and I'm guessing that they're available
in several different programming languages (Reid, am I right on that?).

2) It makes it easier for other programmers to write tools that read, 
modify, and/or write the configuration file correctly.  If my assumption 
about XML libraries being available in several different languages is 
correct, then that means we don't need to write a library for each 
language that people want to use.

3) I believe it would keep the format flexibile enough for future 
expansion (but again, Reid would know better here).

Having configuration files that can be manipulated accurately is 
important for things like automatic installation, GUI's, configuration 
tools, etc.

o Object Files

I've noticed that there's a general agreement that we should not 
encapsulate LLVM bytecode files inside of another object format (such as 
ELF).  However, I'd like to pose a few potential benefits that 
encapsulation in ELF might provide:

1) It may provide a way for standard UNIX tools to handle bytecode files 
without modification.  For example, programs like ar, nm, and file all 
take advantage of the ELF format.  If we generated LLVM ELF files, we 
wouldn't need to write our own nm and ar implementations and port them 
to various platforms.

2) It could mark the bytecode file with other bits of useful 
information, such as the OS and hardware on which the file was generated.

3) It may provide a convenient means of adding dynamic linking with 
other bytecode files.

4) It may provide a convenient place to cache native translations for 
use with the JIT.

Here are the disadvantages I see:

1) Increased disk usage.  For example, symbol table information would 
duplicate the information already in the bytecode file.

2) Automatic execution.  Ideally, if I have a bytecode executable, I 
want to run it directly.  On UNIX, that is done with #!<interpreter>.  I 
believe ELF provides similar functionality (where exec()ing the file can 
load a program or library to do JIT compilation), but if it doesn't, 
then we lose this feature.

o Compiler Driver Name

I'd vote for either llvmcc (llvm compiler collection) or llvmcd (llvm 
compiler driver).  To be more convenient, we could call it llc (LLvm 
Compiler) or llcd (LLvm Compiler Driver).  Calling it llc would require 
renaming llc to something else, which might be appropriate since I view 
llc as a "code generator" and not as a "compiler" (although
both terms
are technically accurate).

Generally, I recommend keeping the name short and not using hyphens 
(because it's slower to type them).

o Optimization options

I agree with the idea of using -O<number> for increasing levels of 
optimization, with -O0 meaning no optimization.  It's a pretty intuitive 
scheme, and many Makefiles that use GCC use the -O option.

-- John T.

-- 
*********************************************************************
* John T. Criswell                         Email: criswell at uiuc.edu *
* Research Programmer                                               *
* University of Illinois at Urbana-Champaign                        *
*                                                                   *
* "It's today!" said Piglet. "My favorite day," said
Pooh.          *
*********************************************************************

Misha Brukman

2004-Aug-04 15:42 UTC

head link

[LLVMdev] Compiler Driver Decisions

On Wed, Aug 04, 2004 at 10:24:20AM -0500, John Criswell
wrote:> o Object Files
> 
> I've noticed that there's a general agreement that we should not
> encapsulate LLVM bytecode files inside of another object format (such
> as ELF).  However, I'd like to pose a few potential benefits that
> encapsulation in ELF might provide:
> 
> 1) It may provide a way for standard UNIX tools to handle bytecode
> files without modification.  For example, programs like ar, nm, and
> file all take advantage of the ELF format.  If we generated LLVM ELF
> files, we wouldn't need to write our own nm and ar implementations and
> port them to various platforms.
System `nm' has no meaning if it's run on an LLVM bytecode file.  Right
now, we already have an llvm-nm, and that works by finding the *LLVM*
symbols, globals and functions, and prints out whether they are defined
or not.

If we just plop the binary LLVM bytecode in an ELF section, it will go
happily ignored by the system nm, and no useful output will be produced.

So, in essence, we *do* need our own nm, ar, etc.  Otherwise, what
you're suggesting is that any bytecode file is in its own ELF section
with a *FULL* native translation separately from it, which is overkill,
IMHO.
 > 2) It could mark the bytecode file with other bits of useful
> information, such as the OS and hardware on which the file was
> generated.
We already have that: in addition to pointer size, Reid as added the
capability to encode the target triple of the system directly into the
bytecode file.
 > 3) It may provide a convenient means of adding dynamic linking with 
> other bytecode files.
Reid has added this as well.
 > 4) It may provide a convenient place to cache native translations for
> use with the JIT.
This an interesting concept, but it seems to be the only one of four
left, and I'm not sure it's worth the trouble of writing and re-writing
and re-patching native code to support this... 
> Here are the disadvantages I see:
> 
> 1) Increased disk usage.  For example, symbol table information would 
> duplicate the information already in the bytecode file.
True that.
 > 2) Automatic execution.  Ideally, if I have a bytecode executable, I
> want to run it directly.  On UNIX, that is done with #!<interpreter>.
> I believe ELF provides similar functionality (where exec()ing the file
> can load a program or library to do JIT compilation), but if it
> doesn't, then we lose this feature.
1. Use LLEE :)
2. Tell the OS (in this case Linux) how to run bytecode files directly:
   http://llvm.cs.uiuc.edu/docs/GettingStarted.html#optionalconfig
 > o Compiler Driver Name
> 
> I'd vote for either llvmcc (llvm compiler collection) or llvmcd (llvm
> compiler driver).  To be more convenient, we could call it llc (LLvm
> Compiler) or llcd (LLvm Compiler Driver).  Calling it llc would
> require renaming llc to something else, which might be appropriate
> since I view llc as a "code generator" and not as a
"compiler"
> (although both terms are technically accurate).
I've voted for llvmcc before, but it was turned down.

LLC is a nice idea, but yeah, it's already taken, and sounds like LCC
which is another compiler...

llvmcd sounds like "chdir compiled to llvm" or "LLVM-specific
chdir"
given the other tools: llvm-as, llvm-gcc, etc.
 > o Optimization options
> 
> I agree with the idea of using -O<number> for increasing levels of
> optimization, with -O0 meaning no optimization.  It's a pretty
> intuitive scheme, and many Makefiles that use GCC use the -O option.
I agree with -O0 instead of -On.

-- 
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu

Reid Spencer

2004-Aug-04 15:44 UTC

head link

[LLVMdev] Compiler Driver Decisions

On Wed, 2004-08-04 at 08:24, John Criswell wrote:
> o Configuration Files
> 
> If it isn't too much trouble, I think we should go with XML for the 
> following reasons:
> 
> 1) We wouldn't need to implement a parsing library.  There are several 
> XML parsing libraries available, and I'm guessing that they're
available
> in several different programming languages (Reid, am I right on that?).
Yes, there are many to choose from. But, some of them are larger than
LLVM :). We'd choose expat (fast, simple, small, good compatibility,
lacks features we don't need). Either that or just write a really simple
recursive descent parser.
> 
> 2) It makes it easier for other programmers to write tools that read, 
> modify, and/or write the configuration file correctly.  If my assumption 
> about XML libraries being available in several different languages is 
> correct, then that means we don't need to write a library for each 
> language that people want to use.
Not sure what you mean here. What's an XML library and are they supposed
to be available in different natural languages or different computer
languages or programming languages or ??Do you mean natural languages?
> 3) I believe it would keep the format flexibile enough for future 
> expansion (but again, Reid would know better here).
Yes. It wouldn't be painless, but going from DTD1 -> DTD2 is much less
painful than going from INI -> XML. That is, the ENTIRE format doesn't
have to change, its just incrementally changing its document type
definition within the XML format.
> 
> Having configuration files that can be manipulated accurately is 
> important for things like automatic installation, GUI's, configuration 
> tools, etc.
Yes, that was my main argument too .. precision for us and others.
> 
> o Object Files
> 
> I've noticed that there's a general agreement that we should not 
> encapsulate LLVM bytecode files inside of another object format (such as 
> ELF).  However, I'd like to pose a few potential benefits that 
> encapsulation in ELF might provide:
> 
> 1) It may provide a way for standard UNIX tools to handle bytecode files 
> without modification.  For example, programs like ar, nm, and file all 
> take advantage of the ELF format.  If we generated LLVM ELF files, we 
> wouldn't need to write our own nm and ar implementations and port them 
> to various platforms.
Consider this: both ar and nm look inside the .o file and read the ELF
format. While we could put the bytecode in a .llvm section, neither tool
would read that section. They would instead look for symbols in other
sections. So, to be useful, we would now have to bloat the .o file with
additional (normal) ELF sections that would allow tools like ar and nm
to discover the symbols in the file.  I think this is a big waste of
time when we already have ar and nm replacements. 

As for the file command, the /etc/magic file can contain a single line
that accurately identifies LLVM object files (first 4 chars are llvm)
 > 
> 2) It could mark the bytecode file with other bits of useful 
> information, such as the OS and hardware on which the file was generated.
That's currently supported with the "target-triple" I just added
to the
bytecode format.
> 
> 3) It may provide a convenient means of adding dynamic linking with 
> other bytecode files.
What did you have in mind?
> 
> 4) It may provide a convenient place to cache native translations for 
> use with the JIT.
It doesn't sound convenient to me. It would be faster to just mmap a
whole region of memory with some kind of index onto disk and reload it
later.
> 
> Here are the disadvantages I see:
> 
> 1) Increased disk usage.  For example, symbol table information would 
> duplicate the information already in the bytecode file.
Right ;>
> o Compiler Driver Name
> 
> I'd vote for either llvmcc (llvm compiler collection) or llvmcd (llvm 
> compiler driver).  To be more convenient, we could call it llc (LLvm 
> Compiler) or llcd (LLvm Compiler Driver).  Calling it llc would require 
> renaming llc to something else, which might be appropriate since I view 
> llc as a "code generator" and not as a "compiler"
(although both terms
> are technically accurate).
I'd vote to leave llc alone. However, I like your shortening idea. It
makes ll-build much more tenable.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040804/61722233/attachment.sig>

Chris Lattner

2004-Aug-04 17:08 UTC

head link

[LLVMdev] Compiler Driver Decisions

On Wed, 4 Aug 2004, John Criswell wrote:
> I thought I would chime in with some ideas and opinions:
>
> o Configuration Files
>
> If it isn't too much trouble, I think we should go with XML for the
> following reasons:
>
> 1) We wouldn't need to implement a parsing library.  There are several
> XML parsing libraries available, and I'm guessing that they're
available
> in several different programming languages (Reid, am I right on that?).
So that's the tension: with XML, there are lots of off-the-shelf tools
that you can use to parse it.  OTOH, this should be an extremely trivial
file that does not need any parsing per-say.  Unless there is a *clear*
advantage to doing so, we should not replace a custom 20 LOC parser with a
gigantic library.
> 2) It makes it easier for other programmers to write tools that read,
> modify, and/or write the configuration file correctly.  If my assumption
> about XML libraries being available in several different languages is
> correct, then that means we don't need to write a library for each
> language that people want to use.
I don't buy this at all.  In particular, these files are provided by
front-end designers for the sole consumption of the driver.  NO other
tools should be looking in these files, they should use the compiler
driver directly.
> 3) I believe it would keep the format flexibile enough for future
> expansion (but again, Reid would know better here).
You can do this with any format you want, just include an explicit version
number.
> Having configuration files that can be manipulated accurately is
> important for things like automatic installation, GUI's, configuration
> tools, etc.
Again, none of these tools should be using these files.
> o Object Files
... Misha did a great job responding to these ...
> 4) It may provide a convenient place to cache native translations for
> use with the JIT.
For native translation caching, we will just emit .so files eventually.
It is no easier to attach a .so file to an existing ELF binary than it is
to attach it to a .bc file.  Also, we probably *don't* want to attach the
cached translations to the executables, though I'm sure some will disagree
strenuously with me :)  In any case, this is still a way out.
> o Optimization options
>
> I agree with the idea of using -O<number> for increasing levels of
> optimization, with -O0 meaning no optimization.  It's a pretty
intuitive
> scheme, and many Makefiles that use GCC use the -O option.
The problem is that -O0 does *not* mean no optimization.  In particular,
with GCC, -O0 runs optimizations that reduce the compile time of the
program (e.g. DCE) that do not impact the debuggability of the program.
Making the default option be -O1 would help deal with this, but I'm still
not convinced that it's a good idea (lots of people have -O0 hard coded
into their makefiles).  *shrug*

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://nondot.org/sabre/

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jul 2004 - [LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

Maybe Matching Threads