Currently, llvm bytecode files must begin with llvm\d where \d = {0,1,2}.
Have you considered allowing a "hash bang path" to precede the llvm
magic number?
This would allow llvm modules to be executable on UNIX systems (and
under cygwin).
Would the community find this useful? I like the idea of
platform-independent binaries
being able to masquerade as native executibles.
I'm thinking of something like the following:
(1) Check for the llvm magic number at the beginning of the file.
If this test passes, proceed normally.
(2) Otherwise, check that the first two bytes of the file are "#!".
if this test fails, indicate a corrupt/invalid bytecode file and
exit with error code.
(3) Scan up to 256 bytes after the #!, looking for a \n.
If no \n is found, indicate a corrupt/invalid bytecode file and exit
with error code.
(4) If the normal llvm magic number doesn't immediately follow the first \n,
then indicate a corrupt/invalid bytecode file and exit with error code.
(5) Parse the file as normal following the llvm magic number.
This seems like a fairly isolated change, and I'd be interested in
implementing it
and submitting a patch if others think this is a good idea.
I wouldn't change the way bytecode files are generated by the compiler.
How does that sound?
-Karl
P.S. It might be good to put references to Ken Thompson's Dis virtual
machine design paper and Michael Franz's SafeTSA Java performance
paper on the llvm website. It was these two papers that caused me to
start looking for an open-source VM that used a (nearly) infinite
register bytecode (SSA or memory machine) instead of the very common
two register bytecode (the top two values on a stack machine's
stack... not counting the instruction pointer).