John Regehr wrote:> Say on that note here's something that I want to see: a formal semantics > for LLVM in for example higher order logic. This would probably not be > that difficult.Except that some aspects of the host platform leak through to .bc files. This may or may not be a problem.> Once the semantics exists, you can either prove once and for all the that > each LLVM->LLVM optimization pass preserves the semantics of the code, or > else just do translation validation.That would be quite nice indeed.> I've tried to sell some of the HOL folks on this idea (they have a formal > semantics for an ARM ISA, for example) but so far no dice.You could sell it to me. How much money do you have? ;-) But seriously, that is something I would very much like to see. One of the reasons why you have not had success may be that with current prover technology, formalizing LLVM semantics and carrying out proofs would be a fairly straightforward exercise, but not a trivial amount of work. Not a good combination in academic sense. On the other hand, formalizing LLVM and proving some properties would make for a very nice open source proof project, which would be an interesting beast. -- Pertti
On Thu, 2008-03-27 at 21:39 +0200, Pertti Kellomäki wrote:> Except that some aspects of the host platform leak through > to .bc files. This may or may not be a problem.A question about this occurred to me last night. BitC has only one machine-dependent type: word. The current runtime.h file typedef's this in a machine-dependent way, but all of the C code emitted by our current back end uses the typedef. Originally we intended to have a self-hosting compiler, and this provided us with a way to do bootstrap. LLVM has a back end that emits C code. How machine dependent is the emitted C code? shap
> Except that some aspects of the host platform leak through > to .bc files. This may or may not be a problem.I know a bit about how Michael Norrish dealt with this sort of thing in his formal semantics for C. For example, integer width is modeled as a constant, but one with an unspecified value. Other C level choices such as order of evaluation of function arguments are modeled using nondeterministic choice. My understanding is that these are not too difficult at the level of the language specification, but that they make proofs about portable C code very difficult. I don't have a sense if the platform leaks in LLVM are bad ones or not.> One of the reasons why you have not had success may be that with current > prover technology, formalizing LLVM semantics and carrying out proofs > would be a fairly straightforward exercise, but not a trivial amount > of work. Not a good combination in academic sense.But the same is true about LLVM itself, yes? Sometimes researchers take on large infrastructure-type projects because these are great enablers of new research. This is what tenure is for. Or did Vikram start this project as an assistant professor? If so, my hat is off :). John Regehr
On Mar 27, 2008, at 3:32 PM, John Regehr wrote:> This is what tenure is for. Or did Vikram start this > project as an assistant professor? If so, my hat is off :). > > John RegehrWell, Chris and I did start this when I was a very junior assistant professor, 2nd year in fact! But the hat-tipping should go to Chris: he took our initial ideas, ran with it, made it his MS thesis, and eventually took it to a level well beyond your typical research infrastructure. --Vikram
Jonathan S. Shapiro wrote:> LLVM has a back end that emits C code. How machine dependent is the > emitted C code?The dependence I was referring to are things like sizeof(int). I doubt the emitted C code has many dependences. -- Pertti