On 10/19/11 11:58 PM, Eli Friedman wrote:> On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy<shea at shealevy.com> wrote: >> 2. Are target-specific behaviors documented for each supported target? > When anything has target-specific behavior, that fact should be > documented. Beyond that, if you have a question about what some > construct is supposed to do, please ask.What I meant was: for a given target-specific behavior, is there anywhere I can look to see what the behavior specifically is for, say, i686-pc-linux, like you are supposed to be able to for implementation-defined behaviors in C?>> 5. Are there any language features with non-performance related semantic >> import (e.g annotations, instructions, intrinsic functions, types, etc.) >> that are not specified by the reference but are nevertheless implemented >> in the build system? > You should be able to analyze the semantics of IR accurately based > purely on information encoded into the IR. Every instruction, type, > attribute etc. should be documented in LangRef. Platform-specific > intrinsics are not documented, but can generally be treated like a > call to an external function.Platform-specific intrinsics are not documented anywhere, or just not in the language reference?>> If so, is it expected that all >> such discovered and possibly corrected deviations will have associated >> bug reports, or might some be corrected in the development repository >> without documentation of the issue outside of a commit message? In other >> words, if I'm working with, say, llvm 2.9 and want to find all >> deviations known to upstream, can I just browse bug reports or will I >> have to go through commit logs as well? > LLVM Bugzilla doesn't contain an entry for every bug; to find every > fix, you'll have to go through commit logs. Not sure what you're > trying to do here, though.Some more detail on my project: I'm mostly doing this so I can get introduced to the field of static analysis, learn what it's big problems are and what's just impossible with it, etc. To that end, however, I've decided to try to implement a set of checks that might actually be useful, to me at least. In particular, I want to see how many of the run-time checks made in hardware when a CPU is in user-mode and memory is segmented can be proven to be unnecessary at compile-time. The (probably impossible) end-goals to this project would be a) that every program which passes its checks would be as safe to run in kernel mode with full memory access as it would be in user mode and b) that a not-insignificant subset of well-written programs passes its checks. If I ever reach the point that I'm actually using this thing to run untrusted code in kernel mode, I'll want to know about as many deviations from the spec as possible to know if they might affect the reasoning my program uses. Thanks for the help, Shea Levy
Don Quixote de la Mancha
2011-Oct-20 09:47 UTC
[LLVMdev] LLVM Language Reference Strictness
On Thu, Oct 20, 2011 at 2:37 AM, Shea Levy <shea at shealevy.com> wrote:>. The > (probably impossible) end-goals to this project would be a) that every > program which passes its checks would be as safe to run in kernel mode > with full memory access as it would be in user modeThat would be a very useful thing to have for embedded systems. Some such as uCLinux run ports of "safe" operating systems with the safety stripped out, whereas others like Texas Instruments' DSP/BIOS run entirely as a single operating system kernel. -- Don Quixote de la Mancha Dulcinea Technologies Corporation Software of Elegance and Beauty http://www.dulcineatech.com quixote at dulcineatech.com
On Oct 20, 2011, at 2:37 AM, Shea Levy wrote:> On 10/19/11 11:58 PM, Eli Friedman wrote: >> On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy<shea at shealevy.com> wrote: >>> 2. Are target-specific behaviors documented for each supported target? >> When anything has target-specific behavior, that fact should be >> documented. Beyond that, if you have a question about what some >> construct is supposed to do, please ask. > What I meant was: for a given target-specific behavior, is there > anywhere I can look to see what the behavior specifically is for, say, > i686-pc-linux, like you are supposed to be able to for > implementation-defined behaviors in C?For the level of specificity you're looking for, just the source code itself. The LLVM IR language documentation is not, and isn't intended to be, a true language standard document in the same way that the C or C++ standards are. For any given case, check the docs first, and if your question isn't answered there, check the source code of the target(s) you're interested in. Regards, Jim>>> 5. Are there any language features with non-performance related semantic >>> import (e.g annotations, instructions, intrinsic functions, types, etc.) >>> that are not specified by the reference but are nevertheless implemented >>> in the build system? >> You should be able to analyze the semantics of IR accurately based >> purely on information encoded into the IR. Every instruction, type, >> attribute etc. should be documented in LangRef. Platform-specific >> intrinsics are not documented, but can generally be treated like a >> call to an external function. > Platform-specific intrinsics are not documented anywhere, or just not in > the language reference? >>> If so, is it expected that all >>> such discovered and possibly corrected deviations will have associated >>> bug reports, or might some be corrected in the development repository >>> without documentation of the issue outside of a commit message? In other >>> words, if I'm working with, say, llvm 2.9 and want to find all >>> deviations known to upstream, can I just browse bug reports or will I >>> have to go through commit logs as well? >> LLVM Bugzilla doesn't contain an entry for every bug; to find every >> fix, you'll have to go through commit logs. Not sure what you're >> trying to do here, though. > Some more detail on my project: I'm mostly doing this so I can get > introduced to the field of static analysis, learn what it's big problems > are and what's just impossible with it, etc. To that end, however, I've > decided to try to implement a set of checks that might actually be > useful, to me at least. In particular, I want to see how many of the > run-time checks made in hardware when a CPU is in user-mode and memory > is segmented can be proven to be unnecessary at compile-time. The > (probably impossible) end-goals to this project would be a) that every > program which passes its checks would be as safe to run in kernel mode > with full memory access as it would be in user mode and b) that a > not-insignificant subset of well-written programs passes its checks. If > I ever reach the point that I'm actually using this thing to run > untrusted code in kernel mode, I'll want to know about as many > deviations from the spec as possible to know if they might affect the > reasoning my program uses. > > Thanks for the help, > Shea Levy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> For the level of specificity you're looking for, just the source code itself. The LLVM IR language documentation is not, and isn't intended to be, a true language standard document in the same way that the C or C++ standards are. For any given case, check the docs first, and if your question isn't answered there, check the source code of the target(s) you're interested in.And once you've understood, submit a doc patch explaining it :) Ciao, Duncan.
On 10/20/11 4:47 AM, Don Quixote de la Mancha wrote:> On Thu, Oct 20, 2011 at 2:37 AM, Shea Levy<shea at shealevy.com> wrote: >> . The >> (probably impossible) end-goals to this project would be a) that every >> program which passes its checks would be as safe to run in kernel mode >> with full memory access as it would be in user mode > That would be a very useful thing to have for embedded systems. Some > such as uCLinux run ports of "safe" operating systems with the safety > stripped out, whereas others like Texas Instruments' DSP/BIOS run > entirely as a single operating system kernel.You may want to read the early SAFECode papers (http://sva.cs.illinois.edu/pubs.html). The paper "Memory Safety without Run-time Checks or Garbage Collection" (http://llvm.org/pubs/2003-05-05-LCTES03-CodeSafety.html) and "Ensuring Code Safety Without Runtime Checks for Real-Time Control Systems" (http://llvm.org/pubs/2002-08-08-CASES02-ControlC.html) are particularly relevant and, I believe, would allow you to run user code in kernel space safely without resorting to run-time checks. You might also want to read the SVA paper from Usenix Security 2009 (http://llvm.org/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.html) and the HyperSafe paper (http://www.csc.ncsu.edu/faculty/jiang/pubs/OAKLAND10.pdf) to get an idea of other memory safety concerns beyond your standard compiler loads and stores. Despite the fact that these papers describe memory safety issues for OS kernel/hypervisor code, these issues also effect user-space code (e.g., threading libraries, mmap(), etc.). As an FYI, SAFECode later evolved into a system that could support general C programs by using a combination of static analysis, an optional memory-region transform, and run-time checks; that is the system available today (http://sva.cs.illinois.edu). The code for those older systems should still be in the safecode SVN repository, though, so I think you could rebuild the original system which rejected type-unsafe programs if you wanted to do so. On a final note, as long as you have the whole program to analyze, using something like SAFECode in its modern form should permit you to run user-space code in the kernel as long as you're willing to accept having run-time checks. That said, there is still a fair amount of work to make sure that the memory safety is airtight (potentially enough to warrant a research paper). The two issues that come to mind off-hand are: 1) There's an issue with using the points-to analysis (DSA) on C++ programs and C programs that mimic vtables; the points-to analysis cannot always tell when it has analyzed the complete program, and that can cause SAFECode's checks to loose completeness. 2) There's still some work left for the run-time checks and static analysis. For example, some of the C standard library still needs run-time checks (or be processed with SAFECode). Special checks are needed on calls to mmap(). Inline assembly needs to be handled somehow. Of course, you can choose to write a static analysis that detects use of those features and rejects the program if those features are found. If you're interested in chatting further on this topic, please feel free to email either me or the svadev at cs.illinois.edu mailing list. -- John T.