thr3ads.net - llvm dev - [LLVMdev] LLVM Language Reference Strictness [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Shea Levy

2011-Oct-20 03:20 UTC

[LLVMdev] LLVM Language Reference Strictness

Hello,

I'd like write a program that performs static analysis of code at the 
LLVM assembly/bitcode level, and to do so I plan on extensively 
referencing the language reference. As I hope to eventually use this 
tool as part of a security analysis of untrusted code, I need to be 
rather strict in my interpretation of the document. As such, I have some 
questions about how the implementers interpret the document (each 
question assumes we're considering a single fixed release version):

1. Is http://www.llvm.org/releases/<version>/docs/LangRef.html the most 
authoritative reference for a given version aside from the source code 
itself?

2. Are target-specific behaviors documented for each supported target?

3. Does undefined behavior semantically invalidate the entire program or 
is its unpredictable effect limited in scope somehow?

4. Are any behaviors undefined by virtue of not being specified in the 
reference, or are all scenarios that lead to undefined behavior 
explicitly identified as such?

5. Are there any language features with non-performance related semantic 
import (e.g annotations, instructions, intrinsic functions, types, etc.) 
that are not specified by the reference but are nevertheless implemented 
in the build system?

6. Are all deviations from the reference, no matter how minor, 
considered bugs (either in the implementation or the spec)? If not, what 
deviations are considered acceptable? If so, is it expected that all 
such discovered and possibly corrected deviations will have associated 
bug reports, or might some be corrected in the development repository 
without documentation of the issue outside of a commit message? In other 
words, if I'm working with, say, llvm 2.9 and want to find all 
deviations known to upstream, can I just browse bug reports or will I 
have to go through commit logs as well?

These are the questions I have for now, but I may have more as I go 
along. Is this the appropriate place to ask this kind of thing?

Thanks,
Shea Levy

Eli Friedman

2011-Oct-20 03:58 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy <shea at shealevy.com>
wrote:> Hello,
>
> I'd like write a program that performs static analysis of code at the
> LLVM assembly/bitcode level, and to do so I plan on extensively
> referencing the language reference. As I hope to eventually use this
> tool as part of a security analysis of untrusted code, I need to be
> rather strict in my interpretation of the document. As such, I have some
> questions about how the implementers interpret the document (each
> question assumes we're considering a single fixed release version):
>
> 1. Is http://www.llvm.org/releases/<version>/docs/LangRef.html the
most
> authoritative reference for a given version aside from the source code
> itself?
Yes.
> 2. Are target-specific behaviors documented for each supported target?
When anything has target-specific behavior, that fact should be
documented.  Beyond that, if you have a question about what some
construct is supposed to do, please ask.
> 3. Does undefined behavior semantically invalidate the entire program or
> is its unpredictable effect limited in scope somehow?
There is no limit to the scope of undefined behavior.
> 4. Are any behaviors undefined by virtue of not being specified in the
> reference, or are all scenarios that lead to undefined behavior
> explicitly identified as such?
We really want to explicitly identify them all in the reference; if
you have a question about some specific case, please ask.
> 5. Are there any language features with non-performance related semantic
> import (e.g annotations, instructions, intrinsic functions, types, etc.)
> that are not specified by the reference but are nevertheless implemented
> in the build system?
You should be able to analyze the semantics of IR accurately based
purely on information encoded into the IR.  Every instruction, type,
attribute etc. should be documented in LangRef.  Platform-specific
intrinsics are not documented, but can generally be treated like a
call to an external function.
> 6. Are all deviations from the reference, no matter how minor,
> considered bugs (either in the implementation or the spec)? If not, what
> deviations are considered acceptable?
If the reference doesn't describe the implementation accurately, we
consider it a bug.  Granted, some bugs are relatively low-priority.
> If so, is it expected that all
> such discovered and possibly corrected deviations will have associated
> bug reports, or might some be corrected in the development repository
> without documentation of the issue outside of a commit message? In other
> words, if I'm working with, say, llvm 2.9 and want to find all
> deviations known to upstream, can I just browse bug reports or will I
> have to go through commit logs as well?
LLVM Bugzilla doesn't contain an entry for every bug; to find every
fix, you'll have to go through commit logs.  Not sure what you're
trying to do here, though.
> These are the questions I have for now, but I may have more as I go
> along. Is this the appropriate place to ask this kind of thing?
Yes.

-Eli

Shea Levy

2011-Oct-20 09:37 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

On 10/19/11 11:58 PM, Eli Friedman wrote:> On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy<shea at shealevy.com> 
wrote:
>> 2. Are target-specific behaviors documented for each supported target?
> When anything has target-specific behavior, that fact should be
> documented.  Beyond that, if you have a question about what some
> construct is supposed to do, please ask.What I meant was: for a given target-specific behavior, is there 
anywhere I can look to see what the behavior specifically is for, say, 
i686-pc-linux, like you are supposed to be able to for 
implementation-defined behaviors in C?>> 5. Are there any language features with non-performance related
semantic
>> import (e.g annotations, instructions, intrinsic functions, types,
etc.)
>> that are not specified by the reference but are nevertheless
implemented
>> in the build system?
> You should be able to analyze the semantics of IR accurately based
> purely on information encoded into the IR.  Every instruction, type,
> attribute etc. should be documented in LangRef.  Platform-specific
> intrinsics are not documented, but can generally be treated like a
> call to an external function.Platform-specific intrinsics are not documented anywhere, or just not in 
the language reference?>> If so, is it expected that all
>> such discovered and possibly corrected deviations will have associated
>> bug reports, or might some be corrected in the development repository
>> without documentation of the issue outside of a commit message? In
other
>> words, if I'm working with, say, llvm 2.9 and want to find all
>> deviations known to upstream, can I just browse bug reports or will I
>> have to go through commit logs as well?
> LLVM Bugzilla doesn't contain an entry for every bug; to find every
> fix, you'll have to go through commit logs.  Not sure what you're
> trying to do here, though.Some more detail on my project: I'm mostly doing this so I can get 
introduced to the field of static analysis, learn what it's big problems 
are and what's just impossible with it, etc. To that end, however, I've 
decided to try to implement a set of checks that might actually be 
useful, to me at least. In particular, I want to see how many of the 
run-time checks made in hardware when a CPU is in user-mode and memory 
is segmented can be proven to be unnecessary at compile-time. The 
(probably impossible) end-goals to this project would be a) that every 
program which passes its checks would be as safe to run in kernel mode 
with full memory access as it would be in user mode and b) that a 
not-insignificant subset of well-written programs passes its checks. If 
I ever reach the point that I'm actually using this thing to run 
untrusted code in kernel mode, I'll want to know about as many 
deviations from the spec as possible to know if they might affect the 
reasoning my program uses.

Thanks for the help,
Shea Levy

Dan Gohman

2011-Oct-20 17:08 UTC

head link

[LLVMdev] LLVM Language Reference Strictness

On Oct 19, 2011, at 8:58 PM, Eli Friedman wrote:
> On Wed, Oct 19, 2011 at 8:20 PM, Shea Levy <shea at shealevy.com>
wrote:
> 
>> 4. Are any behaviors undefined by virtue of not being specified in the
>> reference, or are all scenarios that lead to undefined behavior
>> explicitly identified as such?
> 
> We really want to explicitly identify them all in the reference; if
> you have a question about some specific case, please ask.

However, there is a ton of stuff that's not explicitly identified today.

For example, consider a call to a function address bitcasted to a type
incompatible with the type of the function.  Most of us around here intuitively
know this gets undefined behavior because we know how to think like a C
compiler. But LangRef doesn't discuss this. It doesn't even have a
concept
of "compatible" types with which to discuss it.

What should the rules be?  If we look through LLVM's source code, we
find that the inliner has code for smoothing over caller/callee mismatches.
However, we can't translate this logic into LangRef because it does things
that are impossible to do for non-inlined calls in most backends.  If we
dig through every backend, we could come up with a minimal set of
functionality that could be broadly supported.  However, this set would be
too minimal for clang, for example, which regularly bitcasts objc_msgSend
in ways that it knows will work, but only for non-obvious reasons.

You could spend weeks researching all the nuances of just this problem.
In practice, LLVM just doesn't worry about it.  Problems like this tend to
be
edge cases that don't cause trouble for most people most of the time.
However, you can find them all over the place if you go looking.

Dan

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111020/3d49e61a/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Oct 2011 - [LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

[LLVMdev] LLVM Language Reference Strictness

Maybe Matching Threads