thr3ads.net - llvm dev - [LLVMdev] LLVM Language Reference Strictness [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Shea Levy

2011-Oct-20 09:37 UTC

[LLVMdev] LLVM Language Reference Strictness

On 10/19/11 11:58 PM, Eli Friedman wrote:> On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy<shea at shealevy.com> 
wrote:
>> 2. Are target-specific behaviors documented for each supported target?
> When anything has target-specific behavior, that fact should be
> documented.  Beyond that, if you have a question about what some
> construct is supposed to do, please ask.What I meant was: for a given target-specific behavior, is there 
anywhere I can look to see what the behavior specifically is for, say, 
i686-pc-linux, like you are supposed to be able to for 
implementation-defined behaviors in C?>> 5. Are there any language features with non-performance related
semantic
>> import (e.g annotations, instructions, intrinsic functions, types,
etc.)
>> that are not specified by the reference but are nevertheless
implemented
>> in the build system?
> You should be able to analyze the semantics of IR accurately based
> purely on information encoded into the IR.  Every instruction, type,
> attribute etc. should be documented in LangRef.  Platform-specific
> intrinsics are not documented, but can generally be treated like a
> call to an external function.Platform-specific intrinsics are not documented anywhere, or just not in 
the language reference?>> If so, is it expected that all
>> such discovered and possibly corrected deviations will have associated
>> bug reports, or might some be corrected in the development repository
>> without documentation of the issue outside of a commit message? In
other
>> words, if I'm working with, say, llvm 2.9 and want to find all
>> deviations known to upstream, can I just browse bug reports or will I
>> have to go through commit logs as well?
> LLVM Bugzilla doesn't contain an entry for every bug; to find every
> fix, you'll have to go through commit logs.  Not sure what you're
> trying to do here, though.Some more detail on my project: I'm mostly doing this so I can get 
introduced to the field of static analysis, learn what it's big problems 
are and what's just impossible with it, etc. To that end, however, I've 
decided to try to implement a set of checks that might actually be 
useful, to me at least. In particular, I want to see how many of the 
run-time checks made in hardware when a CPU is in user-mode and memory 
is segmented can be proven to be unnecessary at compile-time. The 
(probably impossible) end-goals to this project would be a) that every 
program which passes its checks would be as safe to run in kernel mode 
with full memory access as it would be in user mode and b) that a 
not-insignificant subset of well-written programs passes its checks. If 
I ever reach the point that I'm actually using this thing to run 
untrusted code in kernel mode, I'll want to know about as many 
deviations from the spec as possible to know if they might affect the 
reasoning my program uses.

Thanks for the help,
Shea Levy

Don Quixote de la Mancha

2011-Oct-20 09:47 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

On Thu, Oct 20, 2011 at 2:37 AM, Shea Levy <shea at shealevy.com>
wrote:>. The
> (probably impossible) end-goals to this project would be a) that every
> program which passes its checks would be as safe to run in kernel mode
> with full memory access as it would be in user mode
That would be a very useful thing to have for embedded systems.  Some
such as uCLinux run ports of "safe" operating systems with the safety
stripped out, whereas others like Texas Instruments' DSP/BIOS run
entirely as a single operating system kernel.
-- 
Don Quixote de la Mancha
Dulcinea Technologies Corporation
Software of Elegance and Beauty
http://www.dulcineatech.com
quixote at dulcineatech.com

Jim Grosbach

2011-Oct-20 13:50 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

On Oct 20, 2011, at 2:37 AM, Shea Levy wrote:
> On 10/19/11 11:58 PM, Eli Friedman wrote:
>> On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy<shea at shealevy.com> 
wrote:
>>> 2. Are target-specific behaviors documented for each supported
target?
>> When anything has target-specific behavior, that fact should be
>> documented.  Beyond that, if you have a question about what some
>> construct is supposed to do, please ask.
> What I meant was: for a given target-specific behavior, is there 
> anywhere I can look to see what the behavior specifically is for, say, 
> i686-pc-linux, like you are supposed to be able to for 
> implementation-defined behaviors in C?
For the level of specificity you're looking for, just the source code
itself. The LLVM IR language documentation is not, and isn't intended to be,
a true language standard document in the same way that the C or C++ standards
are. For any given case, check the docs first, and if your question isn't
answered there, check the source code of the target(s) you're interested in.

Regards,
  Jim
>>> 5. Are there any language features with non-performance related
semantic
>>> import (e.g annotations, instructions, intrinsic functions, types,
etc.)
>>> that are not specified by the reference but are nevertheless
implemented
>>> in the build system?
>> You should be able to analyze the semantics of IR accurately based
>> purely on information encoded into the IR.  Every instruction, type,
>> attribute etc. should be documented in LangRef.  Platform-specific
>> intrinsics are not documented, but can generally be treated like a
>> call to an external function.
> Platform-specific intrinsics are not documented anywhere, or just not in 
> the language reference?
>>> If so, is it expected that all
>>> such discovered and possibly corrected deviations will have
associated
>>> bug reports, or might some be corrected in the development
repository
>>> without documentation of the issue outside of a commit message? In
other
>>> words, if I'm working with, say, llvm 2.9 and want to find all
>>> deviations known to upstream, can I just browse bug reports or will
I
>>> have to go through commit logs as well?
>> LLVM Bugzilla doesn't contain an entry for every bug; to find every
>> fix, you'll have to go through commit logs.  Not sure what
you're
>> trying to do here, though.
> Some more detail on my project: I'm mostly doing this so I can get 
> introduced to the field of static analysis, learn what it's big
problems
> are and what's just impossible with it, etc. To that end, however,
I've
> decided to try to implement a set of checks that might actually be 
> useful, to me at least. In particular, I want to see how many of the 
> run-time checks made in hardware when a CPU is in user-mode and memory 
> is segmented can be proven to be unnecessary at compile-time. The 
> (probably impossible) end-goals to this project would be a) that every 
> program which passes its checks would be as safe to run in kernel mode 
> with full memory access as it would be in user mode and b) that a 
> not-insignificant subset of well-written programs passes its checks. If 
> I ever reach the point that I'm actually using this thing to run 
> untrusted code in kernel mode, I'll want to know about as many 
> deviations from the spec as possible to know if they might affect the 
> reasoning my program uses.
> 
> Thanks for the help,
> Shea Levy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Duncan Sands

2011-Oct-20 14:42 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

> For the level of specificity you're looking for, just the source code
itself. The LLVM IR language documentation is not, and isn't intended to be,
a true language standard document in the same way that the C or C++ standards
are. For any given case, check the docs first, and if your question isn't
answered there, check the source code of the target(s) you're interested in.
And once you've understood, submit a doc patch explaining it :)

Ciao, Duncan.

John Criswell

2011-Oct-20 14:56 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

On 10/20/11 4:47 AM, Don Quixote de la Mancha wrote:> On Thu, Oct 20, 2011 at 2:37 AM, Shea Levy<shea at shealevy.com> 
wrote:
>> . The
>> (probably impossible) end-goals to this project would be a) that every
>> program which passes its checks would be as safe to run in kernel mode
>> with full memory access as it would be in user mode
> That would be a very useful thing to have for embedded systems.  Some
> such as uCLinux run ports of "safe" operating systems with the
safety
> stripped out, whereas others like Texas Instruments' DSP/BIOS run
> entirely as a single operating system kernel.
You may want to read the early SAFECode papers 
(http://sva.cs.illinois.edu/pubs.html).  The paper "Memory Safety 
without Run-time Checks or Garbage Collection" 
(http://llvm.org/pubs/2003-05-05-LCTES03-CodeSafety.html) and "Ensuring 
Code Safety Without Runtime Checks for Real-Time Control Systems" 
(http://llvm.org/pubs/2002-08-08-CASES02-ControlC.html) are particularly 
relevant and, I believe, would allow you to run user code in kernel 
space safely without resorting to run-time checks.

You might also want to read the SVA paper from Usenix Security 2009 
(http://llvm.org/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.html) and the 
HyperSafe paper 
(http://www.csc.ncsu.edu/faculty/jiang/pubs/OAKLAND10.pdf) to get an 
idea of other memory safety concerns beyond your standard compiler loads 
and stores.  Despite the fact that these papers describe memory safety 
issues for OS kernel/hypervisor code, these issues also effect 
user-space code (e.g., threading libraries, mmap(), etc.).

As an FYI, SAFECode later evolved into a system that could support 
general C programs by using a combination of static analysis, an 
optional memory-region transform, and run-time checks; that is the 
system available today (http://sva.cs.illinois.edu).  The code for those 
older systems should still be in the safecode SVN repository, though, so 
I think you could rebuild the original system which rejected type-unsafe 
programs if you wanted to do so.

On a final note, as long as you have the whole program to analyze, using 
something like SAFECode in its modern form should permit you to run 
user-space code in the kernel as long as you're willing to accept having 
run-time checks.  That said, there is still a fair amount of work to 
make sure that the memory safety is airtight (potentially enough to 
warrant a research paper).  The two issues that come to mind off-hand are:

1)  There's an issue with using the points-to analysis (DSA) on C++ 
programs and C programs that mimic vtables; the points-to analysis 
cannot always tell when it has analyzed the complete program, and that 
can cause SAFECode's checks to loose completeness.

2) There's still some work left for the run-time checks and static 
analysis.  For example, some of the C standard library still needs 
run-time checks (or be processed with SAFECode).  Special checks are 
needed on calls to mmap().  Inline assembly needs to be handled 
somehow.  Of course, you can choose to write a static analysis that 
detects use of those features and rejects the program if those features 
are found.

If you're interested in chatting further on this topic, please feel free 
to email either me or the svadev at cs.illinois.edu mailing list.

-- John T.

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Oct 2011 - [LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

Apparently Analagous Threads