thr3ads.net - llvm dev - [LLVMdev] loads from a null address and optimizations [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Török Edwin

2009-Sep-06 21:01 UTC

[LLVMdev] loads from a null address and optimizations

On 2009-09-06 20:52, Bill Wendling wrote:> The problem he's facing here isn't necessarily one of correctness.
> He's dealing with undefined behavior (at least in C code). There are  
> no guarantees that the compiler will retain a certain semantic  
> interpretation of an undefined construct between different versions of  
> the compiler, let alone different optimization levels.
>   
Should LLVM IR inherit all that is undefined behavior in C?
That makes it harder to support other languages, or new languages that
want different semantics
for things that the C standard defines as undefined.

BTW even for C gcc has -fno-delete-null-pointer-checks, and the Linux
kernel started using that recently
by default after all the exploits that mapped NULL to valid memory, and
took advantage of
gcc optimizing away the NULL checks.


On 2009-09-06 23:22, Chris Lattner wrote:> On Sep 6, 2009, at 10:52 AM, Bill Wendling wrote:
>
>   
>> The problem he's facing here isn't necessarily one of
correctness.
>> He's dealing with undefined behavior (at least in C code). There
are
>> no guarantees that the compiler will retain a certain semantic
>> interpretation of an undefined construct between different versions of
>> the compiler, let alone different optimization levels.
>>
>> From what I understand, he wants a particular behavior from the OS (a
>> signal). The compiler shouldn't have to worry about OS semantics in
>> the face of undefined language constructs. That being said, if he
>> wants to implement a couple of passes to change his code, then  
>> sure. :-)
>>     
>
> This is something that LLVM isn't currently good at, but that we're
> actively interested in improving.  Here is some related stuff:
> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt
>   
Looks interesting, but it also looks like a lot of work to implement.
Could instructions have a flag that says whether their semantics is
C-like (i.e. undefined behavior when you load from null etc.), or
something else? (throw exception, etc.).
Optimizations that assume the behavior is undefined should be updated to
check that flag, and perform the optimization only if the flag is set to
C-like.

What do you think?
> I don't know of anyone working on this, or planning to work on it in  
> the short term though.
>   

Although this is something I'd be interested in having, I lack the time
to implement it.

Best regards,
--Edwin

Bill Wendling

2009-Sep-06 22:12 UTC

head link

[LLVMdev] loads from a null address and optimizations

On Sep 6, 2009, at 4:01 PM, Török Edwin <edwintorok at gmail.com> wrote:
> On 2009-09-06 20:52, Bill Wendling wrote:
>> The problem he's facing here isn't necessarily one of
correctness.
>> He's dealing with undefined behavior (at least in C code). There
are
>> no guarantees that the compiler will retain a certain semantic
>> interpretation of an undefined construct between different versions  
>> of
>> the compiler, let alone different optimization levels.
>>
>
> Should LLVM IR inherit all that is undefined behavior in C?
For better or worse, it already inherits some of them. No, I don't  
think the idea is to make LLVM dependent on C's way of doing things.  
But one must assume some base-level of what to do with a particular  
construct.

Apparently, at this time at least, it's considered good to turn a  
dereference of null into unreachable. But like chris mentioned, it's  
something that we should improve.
> That makes it harder to support other languages, or new languages that
> want different semantics
> for things that the C standard defines as undefined.
Yup.
> BTW even for C gcc has -fno-delete-null-pointer-checks, and the Linux
> kernel started using that recently
> by default after all the exploits that mapped NULL to valid memory,  
> and
> took advantage of
> gcc optimizing away the NULL checks.
>What's the affect of this flag? I've never seen it before. :-) If  
we're doing something that violates the semantics of this flag, then  
it's something we need to fix, of course.

-bw
> On 2009-09-06 23:22, Chris Lattner wrote:
>> On Sep 6, 2009, at 10:52 AM, Bill Wendling wrote:
>>
>>
>>> The problem he's facing here isn't necessarily one of
correctness.
>>> He's dealing with undefined behavior (at least in C code).
There are
>>> no guarantees that the compiler will retain a certain semantic
>>> interpretation of an undefined construct between different  
>>> versions of
>>> the compiler, let alone different optimization levels.
>>>
>>> From what I understand, he wants a particular behavior from the OS
>>> (a
>>> signal). The compiler shouldn't have to worry about OS
semantics in
>>> the face of undefined language constructs. That being said, if he
>>> wants to implement a couple of passes to change his code, then
>>> sure. :-)
>>>
>>
>> This is something that LLVM isn't currently good at, but that
we're
>> actively interested in improving.  Here is some related stuff:
>> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt
>>
>
> Looks interesting, but it also looks like a lot of work to implement.
> Could instructions have a flag that says whether their semantics is
> C-like (i.e. undefined behavior when you load from null etc.), or
> something else? (throw exception, etc.).
> Optimizations that assume the behavior is undefined should be  
> updated to
> check that flag, and perform the optimization only if the flag is  
> set to
> C-like.
>
> What do you think?
>
>> I don't know of anyone working on this, or planning to work on it
in
>> the short term though.
>>
>
>
> Although this is something I'd be interested in having, I lack the  
> time
> to implement it.
>
> Best regards,
> --Edwin

Török Edwin

2009-Sep-07 08:37 UTC

head link

[LLVMdev] loads from a null address and optimizations

On 2009-09-07 01:12, Bill Wendling wrote:> On Sep 6, 2009, at 4:01 PM, Török Edwin <edwintorok at gmail.com>
wrote:
>
>> On 2009-09-06 20:52, Bill Wendling wrote:
>>> The problem he's facing here isn't necessarily one of
correctness.
>>> He's dealing with undefined behavior (at least in C code).
There are
>>> no guarantees that the compiler will retain a certain semantic
>>> interpretation of an undefined construct between different versions
of
>>> the compiler, let alone different optimization levels.
>>>
>>
>> Should LLVM IR inherit all that is undefined behavior in C?
>
> For better or worse, it already inherits some of them. No, I don't
> think the idea is to make LLVM dependent on C's way of doing things.
> But one must assume some base-level of what to do with a particular
> construct.
>
> Apparently, at this time at least, it's considered good to turn a
> dereference of null into unreachable. But like chris mentioned, it's
> something that we should improve.
Ok.
>
>> That makes it harder to support other languages, or new languages that
>> want different semantics
>> for things that the C standard defines as undefined.
>
> Yup.
>
>> BTW even for C gcc has -fno-delete-null-pointer-checks, and the Linux
>> kernel started using that recently
>> by default after all the exploits that mapped NULL to valid memory, and
>> took advantage of
>> gcc optimizing away the NULL checks.
>>
> What's the affect of this flag? I've never seen it before. :-) If
> we're doing something that violates the semantics of this flag, then
> it's something we need to fix, of course.
At -O2 and higher gcc deletes if (p == NULL) checks after p has been
dereferenced, assuming that a deref of null halts the program.
-fno-delete-null-pointer-checks disables that optimization.
I haven't seen LLVM do this optimization currently, but maybe I just
haven't seen it yet.
>From the gcc manpage:   `-fdelete-null-pointer-checks'
     Use global dataflow analysis to identify and eliminate useless
     checks for null pointers.  The compiler assumes that dereferencing
     a null pointer would have halted the program.  If a pointer is
     checked after it has already been dereferenced, it cannot be null.

     In some environments, this assumption is not true, and programs can
     safely dereference null pointers.  Use
     `-fno-delete-null-pointer-checks' to disable this optimization for
     programs which depend on that behavior.

     Enabled at levels `-O2', `-O3', `-Os'.


Best regards,
--Edwin

Chris Lattner

2009-Sep-07 15:29 UTC

head link

[LLVMdev] loads from a null address and optimizations

On Sep 6, 2009, at 2:01 PM, Török Edwin wrote:
> On 2009-09-06 20:52, Bill Wendling wrote:
>> The problem he's facing here isn't necessarily one of
correctness.
>> He's dealing with undefined behavior (at least in C code). There
are
>> no guarantees that the compiler will retain a certain semantic
>> interpretation of an undefined construct between different versions  
>> of
>> the compiler, let alone different optimization levels.
>>
>
> Should LLVM IR inherit all that is undefined behavior in C?
Yes, where it is useful for optimization purposes.
> That makes it harder to support other languages, or new languages that
> want different semantics
> for things that the C standard defines as undefined.
This is another question though.  I think that LLVM should support  
taking advantage of undefined behavior in C, but it should also allow  
other languages to model what they need.

As a concrete example, there is no reason not to add a "bit" to  
LoadInst saying whether an "invalid" load is undefined or whether it  
causes an "exception".  The fun part is nailing down which cases of  
"invalid" are allowed, but it isn't that big of a deal.
>>
>> This is something that LLVM isn't currently good at, but that
we're
>> actively interested in improving.  Here is some related stuff:
>> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt
>>
>
> Looks interesting, but it also looks like a lot of work to implement.
Well that is why it hasn't been done yet :)
> Could instructions have a flag that says whether their semantics is
> C-like (i.e. undefined behavior when you load from null etc.), or
> something else? (throw exception, etc.).
Yes.  You need to tell the optimizer what the possible control flow is  
though, or else it will move operations in invalid ways.
> What do you think?
Right!

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090907/4624565a/attachment.html>

Török Edwin

2009-Sep-07 15:36 UTC

head link

[LLVMdev] loads from a null address and optimizations

On 2009-09-07 18:29, Chris Lattner wrote:>
> On Sep 6, 2009, at 2:01 PM, Török Edwin wrote:
>
>> On 2009-09-06 20:52, Bill Wendling wrote:
>>> The problem he's facing here isn't necessarily one of
correctness.
>>> He's dealing with undefined behavior (at least in C code).
There are
>>> no guarantees that the compiler will retain a certain semantic  
>>> interpretation of an undefined construct between different versions
of
>>> the compiler, let alone different optimization levels.
>>>
>>
>> Should LLVM IR inherit all that is undefined behavior in C?
>
> Yes, where it is useful for optimization purposes.
>
>> That makes it harder to support other languages, or new languages that
>> want different semantics
>> for things that the C standard defines as undefined.
>
> This is another question though.  I think that LLVM should support
> taking advantage of undefined behavior in C, but it should also allow
> other languages to model what they need.
>
> As a concrete example, there is no reason not to add a "bit" to
> LoadInst saying whether an "invalid" load is undefined or whether
it
> causes an "exception".  The fun part is nailing down which cases
of
> "invalid" are allowed, but it isn't that big of a deal.
>
>>>
>>> This is something that LLVM isn't currently good at, but that
we're
>>> actively interested in improving.  Here is some related stuff:
>>> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt
>>>
>>
>> Looks interesting, but it also looks like a lot of work to implement.
>
> Well that is why it hasn't been done yet :)
>
>> Could instructions have a flag that says whether their semantics is
>> C-like (i.e. undefined behavior when you load from null etc.), or
>> something else? (throw exception, etc.).
>
> Yes.  You need to tell the optimizer what the possible control flow is
> though, or else it will move operations in invalid ways.
>
Another crazy idea: what if we'd model the invalid/undefined behavior
via an llvm.undefinedbehavior intrinsic that has a parameter specifying
the kind of undefined behavior.
Optimizers should then either insert calls to this intrinsic, or do
whatever they do for C currently if TargetData says
llvm.undefinedbehavior should not be preserved.
Languages that need to handle these undefined behaviors could defined
llvm.undefinedbehavior to throw an exception, call runtime function, etc.

This should work even if functions are marked nounwind, since the
unwinder will find the first stackframe that does have a landingpad and
land there, right [*]?
Frontends for languages that want exception for undef behavior could
then use invoke/unwind to. When LLVM will have a better invoke they'll
switch to that of course.

[*] it seems to work for LLVM at least, operator new throws
std::bad_alloc and opt's catch() catches it, although all of llvm is
compiled with no-exceptions.

Best regards,
--Edwin

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Sep 2009 - [LLVMdev] loads from a null address and optimizations

[LLVMdev] loads from a null address and optimizations

[LLVMdev] loads from a null address and optimizations

[LLVMdev] loads from a null address and optimizations

[LLVMdev] loads from a null address and optimizations

[LLVMdev] loads from a null address and optimizations

Possibly Parallel Threads