thr3ads.net - llvm dev - [LLVMdev] Compiler Driver Decisions [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Reid Spencer

2004-Aug-04 15:44 UTC

[LLVMdev] Compiler Driver Decisions

On Wed, 2004-08-04 at 08:24, John Criswell wrote:
> o Configuration Files
> 
> If it isn't too much trouble, I think we should go with XML for the 
> following reasons:
> 
> 1) We wouldn't need to implement a parsing library.  There are several 
> XML parsing libraries available, and I'm guessing that they're
available
> in several different programming languages (Reid, am I right on that?).
Yes, there are many to choose from. But, some of them are larger than
LLVM :). We'd choose expat (fast, simple, small, good compatibility,
lacks features we don't need). Either that or just write a really simple
recursive descent parser.
> 
> 2) It makes it easier for other programmers to write tools that read, 
> modify, and/or write the configuration file correctly.  If my assumption 
> about XML libraries being available in several different languages is 
> correct, then that means we don't need to write a library for each 
> language that people want to use.
Not sure what you mean here. What's an XML library and are they supposed
to be available in different natural languages or different computer
languages or programming languages or ??Do you mean natural languages?
> 3) I believe it would keep the format flexibile enough for future 
> expansion (but again, Reid would know better here).
Yes. It wouldn't be painless, but going from DTD1 -> DTD2 is much less
painful than going from INI -> XML. That is, the ENTIRE format doesn't
have to change, its just incrementally changing its document type
definition within the XML format.
> 
> Having configuration files that can be manipulated accurately is 
> important for things like automatic installation, GUI's, configuration 
> tools, etc.
Yes, that was my main argument too .. precision for us and others.
> 
> o Object Files
> 
> I've noticed that there's a general agreement that we should not 
> encapsulate LLVM bytecode files inside of another object format (such as 
> ELF).  However, I'd like to pose a few potential benefits that 
> encapsulation in ELF might provide:
> 
> 1) It may provide a way for standard UNIX tools to handle bytecode files 
> without modification.  For example, programs like ar, nm, and file all 
> take advantage of the ELF format.  If we generated LLVM ELF files, we 
> wouldn't need to write our own nm and ar implementations and port them 
> to various platforms.
Consider this: both ar and nm look inside the .o file and read the ELF
format. While we could put the bytecode in a .llvm section, neither tool
would read that section. They would instead look for symbols in other
sections. So, to be useful, we would now have to bloat the .o file with
additional (normal) ELF sections that would allow tools like ar and nm
to discover the symbols in the file.  I think this is a big waste of
time when we already have ar and nm replacements. 

As for the file command, the /etc/magic file can contain a single line
that accurately identifies LLVM object files (first 4 chars are llvm)
 > 
> 2) It could mark the bytecode file with other bits of useful 
> information, such as the OS and hardware on which the file was generated.
That's currently supported with the "target-triple" I just added
to the
bytecode format.
> 
> 3) It may provide a convenient means of adding dynamic linking with 
> other bytecode files.
What did you have in mind?
> 
> 4) It may provide a convenient place to cache native translations for 
> use with the JIT.
It doesn't sound convenient to me. It would be faster to just mmap a
whole region of memory with some kind of index onto disk and reload it
later.
> 
> Here are the disadvantages I see:
> 
> 1) Increased disk usage.  For example, symbol table information would 
> duplicate the information already in the bytecode file.
Right ;>
> o Compiler Driver Name
> 
> I'd vote for either llvmcc (llvm compiler collection) or llvmcd (llvm 
> compiler driver).  To be more convenient, we could call it llc (LLvm 
> Compiler) or llcd (LLvm Compiler Driver).  Calling it llc would require 
> renaming llc to something else, which might be appropriate since I view 
> llc as a "code generator" and not as a "compiler"
(although both terms
> are technically accurate).
I'd vote to leave llc alone. However, I like your shortening idea. It
makes ll-build much more tenable.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040804/61722233/attachment.sig>

John Criswell

2004-Aug-04 19:21 UTC

head link

[LLVMdev] Compiler Driver Decisions

Reid Spencer wrote:> On Wed, 2004-08-04 at 08:24, John Criswell wrote:
> 
> 
>>o Configuration Files
>>
>>If it isn't too much trouble, I think we should go with XML for the 
>>following reasons:
>>
>>1) We wouldn't need to implement a parsing library.  There are
several
>>XML parsing libraries available, and I'm guessing that they're
available
>>in several different programming languages (Reid, am I right on that?).
> 
> 
> Yes, there are many to choose from. But, some of them are larger than
> LLVM :). We'd choose expat (fast, simple, small, good compatibility,
> lacks features we don't need). Either that or just write a really
simple
> recursive descent parser.
> 
> 
>>2) It makes it easier for other programmers to write tools that read, 
>>modify, and/or write the configuration file correctly.  If my assumption
>>about XML libraries being available in several different languages is 
>>correct, then that means we don't need to write a library for each 
>>language that people want to use.
> 
> 
> Not sure what you mean here. What's an XML library and are they
supposed
> to be available in different natural languages or different computer
> languages or programming languages or ??Do you mean natural languages?
I meant programming languages.  Python already has interfaces to XML.  I 
bet Perl has a module to parse XML too.
> 
> 
>>3) I believe it would keep the format flexibile enough for future 
>>expansion (but again, Reid would know better here).
> 
> 
> Yes. It wouldn't be painless, but going from DTD1 -> DTD2 is much
less
> painful than going from INI -> XML. That is, the ENTIRE format
doesn't
> have to change, its just incrementally changing its document type
> definition within the XML format.
> 
> 
>>Having configuration files that can be manipulated accurately is 
>>important for things like automatic installation, GUI's,
configuration
>>tools, etc.
> 
> 
> Yes, that was my main argument too .. precision for us and others.
My general impression is that when one rolls their own file format, 
others write shell scripts that handle the common cases but usually foul 
up on corner cases.  I figured XML would reduce the likelyhood that this 
scenario would happen.

Of course, with XML, programs could still do things incorrectly, but it 
would be easier to get it right.
> 
> 
>>o Object Files
>>
>>I've noticed that there's a general agreement that we should not
>>encapsulate LLVM bytecode files inside of another object format (such as
>>ELF).  However, I'd like to pose a few potential benefits that 
>>encapsulation in ELF might provide:
>>
>>1) It may provide a way for standard UNIX tools to handle bytecode files
>>without modification.  For example, programs like ar, nm, and file all 
>>take advantage of the ELF format.  If we generated LLVM ELF files, we 
>>wouldn't need to write our own nm and ar implementations and port
them
>>to various platforms.
> 
> 
> Consider this: both ar and nm look inside the .o file and read the ELF
> format. While we could put the bytecode in a .llvm section, neither tool
> would read that section. They would instead look for symbols in other
> sections. So, to be useful, we would now have to bloat the .o file with
> additional (normal) ELF sections that would allow tools like ar and nm
> to discover the symbols in the file.  I think this is a big waste of
> time when we already have ar and nm replacements. 
In reply to Misha's comment, this is how nm and ar would work without 
modification: the symbol information would have to be duplicated in the 
ELF section that holds the symbol table.

Let me back up for a minute.  As far as LLVM object files and 
executables go, here's the features that I would want, in order of 
importance:

1) Automatic execution of bytecode executable files.

I would like to be able to run bytecode files directly, the same way I 
can run a shell script, Python program, or ELF executable directly.  I 
think having to specify an interpreter on the command line (like java 
program.class) or having to enter a different execution environment 
(llee /bin/sh) is inconvenient and doesn't integrate into the system as 
well as it could.

2) Integration with system tools.

It would be nice if a common set of tools could manipulate bytecode 
files.  Having the system ar, nm, and file programs work on bytecode and 
native code object files would be great.  Having LLVM provided versions 
that do the same thing would be second best.  A parallel set of LLVM 
tools is third best.

ELF encapsulation gets us #2, which, at this point, I think I'm willing 
to say isn't all that important.  I think LLVM provided tools will do.

In regards to Misha's comments about the automatic execution of bytecode 
files, there are several ways to do it:

1) Have bytecode files start with #!<JIT/llee/whatever> (portable)
2) Encapsulate with ELF
3) Register the type with the kernel (Linux only)

I don't really care for the llee approach, as it can be broken with 
subsequent LD_PRELOADs, requires that I enter an alternative execution 
environment, and requires that I remember to run llee.  I believe the 
methods above are less error-prone and integrate into the system more 
cleanly.

-- John T.

-- 
*********************************************************************
* John T. Criswell                         Email: criswell at uiuc.edu *
* Research Programmer                                               *
* University of Illinois at Urbana-Champaign                        *
*                                                                   *
* "It's today!" said Piglet. "My favorite day," said
Pooh.          *
*********************************************************************

Reid Spencer

2004-Aug-04 19:36 UTC

head link

[LLVMdev] Compiler Driver Decisions

On Wed, 2004-08-04 at 12:21, John Criswell wrote:> In regards to Misha's comments about the automatic execution of
bytecode
> files, there are several ways to do it:
> 
> 1) Have bytecode files start with #!<JIT/llee/whatever> (portable)
> 2) Encapsulate with ELF
> 3) Register the type with the kernel (Linux only)
> 
> I don't really care for the llee approach, as it can be broken with 
> subsequent LD_PRELOADs, requires that I enter an alternative execution 
> environment, and requires that I remember to run llee.  I believe the 
> methods above are less error-prone and integrate into the system more 
> cleanly.
Unfortunately, the #!... convention is not supported on all operating
systems although it is very common on UNIX. I think we're going to end
up with a mixture of things:

1. The llee (llvm-run) approach needs to be maintained for those systems
    where all you can do is run a program (think OS/390, Windows, etc.)
2. We can do the #! trick now without modifying the bytecode file. We
    have a convention like this:
    #!/path/to/llvm-run -
    llvm......(bytecode)
    When llvm-run is given the - option, it reads the rest of the file
    as bytecode. This is how a shell works too.
3. We might want to eventually have an installer that registers the type
    with the kernel but I think that's a long way off. We should
    concentrate effort on items 1. and 2. above.

I don't think we need to do any encapsulation with ELF to accomplish the
same goals.

Reid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20040804/d8ff9f02/attachment.sig>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Aug 2004 - [LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

[LLVMdev] Compiler Driver Decisions

Reasonably Related Threads