On 2009-09-06 20:52, Bill Wendling wrote:> The problem he's facing here isn't necessarily one of correctness. > He's dealing with undefined behavior (at least in C code). There are > no guarantees that the compiler will retain a certain semantic > interpretation of an undefined construct between different versions of > the compiler, let alone different optimization levels. >Should LLVM IR inherit all that is undefined behavior in C? That makes it harder to support other languages, or new languages that want different semantics for things that the C standard defines as undefined. BTW even for C gcc has -fno-delete-null-pointer-checks, and the Linux kernel started using that recently by default after all the exploits that mapped NULL to valid memory, and took advantage of gcc optimizing away the NULL checks. On 2009-09-06 23:22, Chris Lattner wrote:> On Sep 6, 2009, at 10:52 AM, Bill Wendling wrote: > > >> The problem he's facing here isn't necessarily one of correctness. >> He's dealing with undefined behavior (at least in C code). There are >> no guarantees that the compiler will retain a certain semantic >> interpretation of an undefined construct between different versions of >> the compiler, let alone different optimization levels. >> >> From what I understand, he wants a particular behavior from the OS (a >> signal). The compiler shouldn't have to worry about OS semantics in >> the face of undefined language constructs. That being said, if he >> wants to implement a couple of passes to change his code, then >> sure. :-) >> > > This is something that LLVM isn't currently good at, but that we're > actively interested in improving. Here is some related stuff: > http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt >Looks interesting, but it also looks like a lot of work to implement. Could instructions have a flag that says whether their semantics is C-like (i.e. undefined behavior when you load from null etc.), or something else? (throw exception, etc.). Optimizations that assume the behavior is undefined should be updated to check that flag, and perform the optimization only if the flag is set to C-like. What do you think?> I don't know of anyone working on this, or planning to work on it in > the short term though. >Although this is something I'd be interested in having, I lack the time to implement it. Best regards, --Edwin
On Sep 6, 2009, at 4:01 PM, Török Edwin <edwintorok at gmail.com> wrote:> On 2009-09-06 20:52, Bill Wendling wrote: >> The problem he's facing here isn't necessarily one of correctness. >> He's dealing with undefined behavior (at least in C code). There are >> no guarantees that the compiler will retain a certain semantic >> interpretation of an undefined construct between different versions >> of >> the compiler, let alone different optimization levels. >> > > Should LLVM IR inherit all that is undefined behavior in C?For better or worse, it already inherits some of them. No, I don't think the idea is to make LLVM dependent on C's way of doing things. But one must assume some base-level of what to do with a particular construct. Apparently, at this time at least, it's considered good to turn a dereference of null into unreachable. But like chris mentioned, it's something that we should improve.> That makes it harder to support other languages, or new languages that > want different semantics > for things that the C standard defines as undefined.Yup.> BTW even for C gcc has -fno-delete-null-pointer-checks, and the Linux > kernel started using that recently > by default after all the exploits that mapped NULL to valid memory, > and > took advantage of > gcc optimizing away the NULL checks. >What's the affect of this flag? I've never seen it before. :-) If we're doing something that violates the semantics of this flag, then it's something we need to fix, of course. -bw> On 2009-09-06 23:22, Chris Lattner wrote: >> On Sep 6, 2009, at 10:52 AM, Bill Wendling wrote: >> >> >>> The problem he's facing here isn't necessarily one of correctness. >>> He's dealing with undefined behavior (at least in C code). There are >>> no guarantees that the compiler will retain a certain semantic >>> interpretation of an undefined construct between different >>> versions of >>> the compiler, let alone different optimization levels. >>> >>> From what I understand, he wants a particular behavior from the OS >>> (a >>> signal). The compiler shouldn't have to worry about OS semantics in >>> the face of undefined language constructs. That being said, if he >>> wants to implement a couple of passes to change his code, then >>> sure. :-) >>> >> >> This is something that LLVM isn't currently good at, but that we're >> actively interested in improving. Here is some related stuff: >> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt >> > > Looks interesting, but it also looks like a lot of work to implement. > Could instructions have a flag that says whether their semantics is > C-like (i.e. undefined behavior when you load from null etc.), or > something else? (throw exception, etc.). > Optimizations that assume the behavior is undefined should be > updated to > check that flag, and perform the optimization only if the flag is > set to > C-like. > > What do you think? > >> I don't know of anyone working on this, or planning to work on it in >> the short term though. >> > > > Although this is something I'd be interested in having, I lack the > time > to implement it. > > Best regards, > --Edwin
On 2009-09-07 01:12, Bill Wendling wrote:> On Sep 6, 2009, at 4:01 PM, Török Edwin <edwintorok at gmail.com> wrote: > >> On 2009-09-06 20:52, Bill Wendling wrote: >>> The problem he's facing here isn't necessarily one of correctness. >>> He's dealing with undefined behavior (at least in C code). There are >>> no guarantees that the compiler will retain a certain semantic >>> interpretation of an undefined construct between different versions of >>> the compiler, let alone different optimization levels. >>> >> >> Should LLVM IR inherit all that is undefined behavior in C? > > For better or worse, it already inherits some of them. No, I don't > think the idea is to make LLVM dependent on C's way of doing things. > But one must assume some base-level of what to do with a particular > construct. > > Apparently, at this time at least, it's considered good to turn a > dereference of null into unreachable. But like chris mentioned, it's > something that we should improve.Ok.> >> That makes it harder to support other languages, or new languages that >> want different semantics >> for things that the C standard defines as undefined. > > Yup. > >> BTW even for C gcc has -fno-delete-null-pointer-checks, and the Linux >> kernel started using that recently >> by default after all the exploits that mapped NULL to valid memory, and >> took advantage of >> gcc optimizing away the NULL checks. >> > What's the affect of this flag? I've never seen it before. :-) If > we're doing something that violates the semantics of this flag, then > it's something we need to fix, of course.At -O2 and higher gcc deletes if (p == NULL) checks after p has been dereferenced, assuming that a deref of null halts the program. -fno-delete-null-pointer-checks disables that optimization. I haven't seen LLVM do this optimization currently, but maybe I just haven't seen it yet.>From the gcc manpage:`-fdelete-null-pointer-checks' Use global dataflow analysis to identify and eliminate useless checks for null pointers. The compiler assumes that dereferencing a null pointer would have halted the program. If a pointer is checked after it has already been dereferenced, it cannot be null. In some environments, this assumption is not true, and programs can safely dereference null pointers. Use `-fno-delete-null-pointer-checks' to disable this optimization for programs which depend on that behavior. Enabled at levels `-O2', `-O3', `-Os'. Best regards, --Edwin
On Sep 6, 2009, at 2:01 PM, Török Edwin wrote:> On 2009-09-06 20:52, Bill Wendling wrote: >> The problem he's facing here isn't necessarily one of correctness. >> He's dealing with undefined behavior (at least in C code). There are >> no guarantees that the compiler will retain a certain semantic >> interpretation of an undefined construct between different versions >> of >> the compiler, let alone different optimization levels. >> > > Should LLVM IR inherit all that is undefined behavior in C?Yes, where it is useful for optimization purposes.> That makes it harder to support other languages, or new languages that > want different semantics > for things that the C standard defines as undefined.This is another question though. I think that LLVM should support taking advantage of undefined behavior in C, but it should also allow other languages to model what they need. As a concrete example, there is no reason not to add a "bit" to LoadInst saying whether an "invalid" load is undefined or whether it causes an "exception". The fun part is nailing down which cases of "invalid" are allowed, but it isn't that big of a deal.>> >> This is something that LLVM isn't currently good at, but that we're >> actively interested in improving. Here is some related stuff: >> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt >> > > Looks interesting, but it also looks like a lot of work to implement.Well that is why it hasn't been done yet :)> Could instructions have a flag that says whether their semantics is > C-like (i.e. undefined behavior when you load from null etc.), or > something else? (throw exception, etc.).Yes. You need to tell the optimizer what the possible control flow is though, or else it will move operations in invalid ways.> What do you think?Right! -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090907/4624565a/attachment.html>
On 2009-09-07 18:29, Chris Lattner wrote:> > On Sep 6, 2009, at 2:01 PM, Török Edwin wrote: > >> On 2009-09-06 20:52, Bill Wendling wrote: >>> The problem he's facing here isn't necessarily one of correctness. >>> He's dealing with undefined behavior (at least in C code). There are >>> no guarantees that the compiler will retain a certain semantic >>> interpretation of an undefined construct between different versions of >>> the compiler, let alone different optimization levels. >>> >> >> Should LLVM IR inherit all that is undefined behavior in C? > > Yes, where it is useful for optimization purposes. > >> That makes it harder to support other languages, or new languages that >> want different semantics >> for things that the C standard defines as undefined. > > This is another question though. I think that LLVM should support > taking advantage of undefined behavior in C, but it should also allow > other languages to model what they need. > > As a concrete example, there is no reason not to add a "bit" to > LoadInst saying whether an "invalid" load is undefined or whether it > causes an "exception". The fun part is nailing down which cases of > "invalid" are allowed, but it isn't that big of a deal. > >>> >>> This is something that LLVM isn't currently good at, but that we're >>> actively interested in improving. Here is some related stuff: >>> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt >>> >> >> Looks interesting, but it also looks like a lot of work to implement. > > Well that is why it hasn't been done yet :) > >> Could instructions have a flag that says whether their semantics is >> C-like (i.e. undefined behavior when you load from null etc.), or >> something else? (throw exception, etc.). > > Yes. You need to tell the optimizer what the possible control flow is > though, or else it will move operations in invalid ways. >Another crazy idea: what if we'd model the invalid/undefined behavior via an llvm.undefinedbehavior intrinsic that has a parameter specifying the kind of undefined behavior. Optimizers should then either insert calls to this intrinsic, or do whatever they do for C currently if TargetData says llvm.undefinedbehavior should not be preserved. Languages that need to handle these undefined behaviors could defined llvm.undefinedbehavior to throw an exception, call runtime function, etc. This should work even if functions are marked nounwind, since the unwinder will find the first stackframe that does have a landingpad and land there, right [*]? Frontends for languages that want exception for undef behavior could then use invoke/unwind to. When LLVM will have a better invoke they'll switch to that of course. [*] it seems to work for LLVM at least, operator new throws std::bad_alloc and opt's catch() catches it, although all of llvm is compiled with no-exceptions. Best regards, --Edwin
Reasonably Related Threads
- [LLVMdev] loads from a null address and optimizations
- [LLVMdev] loads from a null address and optimizations
- [LLVMdev] loads from a null address and optimizations
- [LLVMdev] loads from a null address and optimizations
- [LLVMdev] loads from a null address and optimizations