thr3ads.net - llvm dev - [llvm-dev] Is it ok to allocate

If this information is useful, please help other people find it:
Share via:

Nuno Lopes via llvm-dev

2017-Nov-08 17:24 UTC

[llvm-dev] Is it ok to allocate > half of address space?

Hi,

I was looking into the semantics of GEP inbounds and some BasicAA  
rules and I'm wondering if it's valid in LLVM IR to allocate more than  
half of the address space with a global variable or an alloca.
If that's a scenario want to consider, then we have problems :)

Consider this C code (32 bits):
#include <string.h>

char obj[0x80000008];

char f() {
   char *p = obj + 0x79999999;
   char *q = obj + 0x80000000;
   *q = 1;
   memcpy(p, "abcd", 4);
   return *q;
}


Clearly the stores alias, and the memcpy should override the value  
written by "*q = 1".

I dunno if this is legal in C or not, but the IR produced by clang  
looks like (32 bits):

@obj = common global [2147483656 x i8] zeroinitializer, align 1

define signext i8 @f() {
   store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr  
inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),  
i32 -2147483648), align 1
   call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds  
([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465),  
i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0),  
i32 4, i32 1, i1 false)
   %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr  
inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),  
i32 -2147483648), align 1
   ret i8 %1
}

With -O2, the store to q gets forwarded, and so we get "ret i8 1".
So, BasicAA concluded that p and q don't alias. The culprit is an  
overflow in BasicAAResult::isGEPBaseAtNegativeOffset().

So my question is do we care about this use case where a single  
allocation can take more than half of the address space?

Thanks,
Nuno

Björn Pettersson A via llvm-dev

2017-Nov-08 17:41 UTC

head link

[llvm-dev] Is it ok to allocate > half of address space?

Hi Nuno.
I can't answer your question, but I know that Mikael Holmén wrote a trouble
report about problems in GVN related to objects larger than half of address
space:
  https://bugs.llvm.org/show_bug.cgi?id=34344

It ended up in a long discussion with Eli Friedman, and then I think we just
left it as an open trouble report.

/Björn
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Nuno
> Lopes via llvm-dev
> Sent: den 8 november 2017 18:24
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Is it ok to allocate > half of address space?
> 
> Hi,
> 
> I was looking into the semantics of GEP inbounds and some BasicAA
> rules and I'm wondering if it's valid in LLVM IR to allocate more
than
> half of the address space with a global variable or an alloca.
> If that's a scenario want to consider, then we have problems :)
> 
> Consider this C code (32 bits):
> #include <string.h>
> 
> char obj[0x80000008];
> 
> char f() {
>    char *p = obj + 0x79999999;
>    char *q = obj + 0x80000000;
>    *q = 1;
>    memcpy(p, "abcd", 4);
>    return *q;
> }
> 
> 
> Clearly the stores alias, and the memcpy should override the value
> written by "*q = 1".
> 
> I dunno if this is legal in C or not, but the IR produced by clang
> looks like (32 bits):
> 
> @obj = common global [2147483656 x i8] zeroinitializer, align 1
> 
> define signext i8 @f() {
>    store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),
> i32 -2147483648), align 1
>    call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds
> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465),
> i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0),
> i32 4, i32 1, i1 false)
>    %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),
> i32 -2147483648), align 1
>    ret i8 %1
> }
> 
> With -O2, the store to q gets forwarded, and so we get "ret i8
1".
> So, BasicAA concluded that p and q don't alias. The culprit is an
> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
> 
> So my question is do we care about this use case where a single
> allocation can take more than half of the address space?
> 
> Thanks,
> Nuno
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nuno Lopes via llvm-dev

2017-Nov-08 18:13 UTC

head link

[llvm-dev] Is it ok to allocate > half of address space?

Many thanks for the pointer!  I missed that bug report since the title  
was about GVN.

If there's interest in supporting this feature I can help since we've  
formalized most of BasicAA. I can easily verify if proposed changes  
are correct. (I'll release the code soon).

Nuno


Quoting Björn Pettersson A <bjorn.a.pettersson at ericsson.com>:
> Hi Nuno.
> I can't answer your question, but I know that Mikael Holmén wrote a  
> trouble report about problems in GVN related to objects larger than  
> half of address space:
>   https://bugs.llvm.org/show_bug.cgi?id=34344
>
> It ended up in a long discussion with Eli Friedman, and then I think  
> we just left it as an open trouble report.
>
> /Björn
>
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Nuno
>> Lopes via llvm-dev
>> Sent: den 8 november 2017 18:24
>> To: llvm-dev at lists.llvm.org
>> Subject: [llvm-dev] Is it ok to allocate > half of address space?
>>
>> Hi,
>>
>> I was looking into the semantics of GEP inbounds and some BasicAA
>> rules and I'm wondering if it's valid in LLVM IR to allocate
more than
>> half of the address space with a global variable or an alloca.
>> If that's a scenario want to consider, then we have problems :)
>>
>> Consider this C code (32 bits):
>> #include <string.h>
>>
>> char obj[0x80000008];
>>
>> char f() {
>>    char *p = obj + 0x79999999;
>>    char *q = obj + 0x80000000;
>>    *q = 1;
>>    memcpy(p, "abcd", 4);
>>    return *q;
>> }
>>
>>
>> Clearly the stores alias, and the memcpy should override the value
>> written by "*q = 1".
>>
>> I dunno if this is legal in C or not, but the IR produced by clang
>> looks like (32 bits):
>>
>> @obj = common global [2147483656 x i8] zeroinitializer, align 1
>>
>> define signext i8 @f() {
>>    store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr
>> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),
>> i32 -2147483648), align 1
>>    call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465),
>> i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0),
>> i32 4, i32 1, i1 false)
>>    %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr
>> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),
>> i32 -2147483648), align 1
>>    ret i8 %1
>> }
>>
>> With -O2, the store to q gets forwarded, and so we get "ret i8
1".
>> So, BasicAA concluded that p and q don't alias. The culprit is an
>> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
>>
>> So my question is do we care about this use case where a single
>> allocation can take more than half of the address space?
>>
>> Thanks,
>> Nuno

Alexander Cherepanov via llvm-dev

2017-Nov-08 19:26 UTC

head link

[llvm-dev] Is it ok to allocate > half of address space?

On 11/08/2017 08:24 PM, Nuno Lopes via llvm-dev wrote:> I was looking into the semantics of GEP inbounds and some BasicAA rules 
> and I'm wondering if it's valid in LLVM IR to allocate more than
half of
> the address space with a global variable or an alloca.
> If that's a scenario want to consider, then we have problems :)
> 
> Consider this C code (32 bits):
> #include <string.h>
> 
> char obj[0x80000008];
> 
> char f() {
>    char *p = obj + 0x79999999;
I guess you mean 0x7fffffff here.
>    char *q = obj + 0x80000000;
>    *q = 1;
>    memcpy(p, "abcd", 4);
>    return *q;
> }
> 
> 
> Clearly the stores alias, and the memcpy should override the value 
> written by "*q = 1".
> 
> I dunno if this is legal in C or not, but the IR produced by clang looks 
> like (32 bits):
> 
> @obj = common global [2147483656 x i8] zeroinitializer, align 1
> 
> define signext i8 @f() {
>    store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr 
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), i32 
> -2147483648), align 1
>    call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds 
> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), i8* 
> getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i32 4, 
> i32 1, i1 false)
>    %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr 
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), i32 
> -2147483648), align 1
>    ret i8 %1
> }
> 
> With -O2, the store to q gets forwarded, and so we get "ret i8
1".
> So, BasicAA concluded that p and q don't alias. The culprit is an 
> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
> 
> So my question is do we care about this use case where a single 
> allocation can take more than half of the address space?Yeah, I'm curious about it too. One of the complications is that the 
compiler doesn't control all the situation -- the size of the allocation 
could be read by the program from outside and the allocation could be 
done by a libc (and glibc will happily allocate more than half the 
address space).

There is a good discussion of various related topics in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67999 .

-- 
Alexander Cherepanov

Friedman, Eli via llvm-dev

2017-Nov-08 22:06 UTC

head link

[llvm-dev] Is it ok to allocate > half of address space?

On 11/8/2017 9:24 AM, Nuno Lopes via llvm-dev wrote:> Hi,
>
> I was looking into the semantics of GEP inbounds and some BasicAA 
> rules and I'm wondering if it's valid in LLVM IR to allocate more
than
> half of the address space with a global variable or an alloca.
> If that's a scenario want to consider, then we have problems :)
>
> Consider this C code (32 bits):
> #include <string.h>
>
> char obj[0x80000008];
>
> char f() {
>   char *p = obj + 0x79999999;
>   char *q = obj + 0x80000000;
>   *q = 1;
>   memcpy(p, "abcd", 4);
>   return *q;
> }
>
>
> Clearly the stores alias, and the memcpy should override the value 
> written by "*q = 1".
>
> I dunno if this is legal in C or not, but the IR produced by clang 
> looks like (32 bits):
>
> @obj = common global [2147483656 x i8] zeroinitializer, align 1
>
> define signext i8 @f() {
>   store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr 
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
> i32 -2147483648), align 1
>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds 
> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), 
> i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), 
> i32 4, i32 1, i1 false)
>   %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr 
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
> i32 -2147483648), align 1
>   ret i8 %1
> }
>
> With -O2, the store to q gets forwarded, and so we get "ret i8
1".
> So, BasicAA concluded that p and q don't alias. The culprit is an 
> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
>
> So my question is do we care about this use case where a single 
> allocation can take more than half of the address space?
>
Accoding to LangRef, your IR currently has undefined behavior: the rules 
for "inbounds" GEPs say that indexes are treated as signed values. 
And
solving that would involve changing the way we represent GEPs in IR, so 
I think you can consider that out of scope.

Assuming we're not dealing with inbounds GEPs (e.g. you pass -fwrapv to 
clang), I don't see any particular reason to disallow allocations more 
than half the address-space.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Nuno Lopes via llvm-dev

2017-Nov-08 23:18 UTC

head link

[llvm-dev] Is it ok to allocate > half of address space?

>On 11/8/2017 9:24 AM, Nuno Lopes via llvm-dev wrote:
>> Hi,
>>
>> I was looking into the semantics of GEP inbounds and some BasicAA rules
>> and I'm wondering if it's valid in LLVM IR to allocate more
than half of
>> the address space with a global variable or an alloca.
>> If that's a scenario want to consider, then we have problems :)
>>
>> Consider this C code (32 bits):
>> #include <string.h>
>>
>> char obj[0x80000008];
>>
>> char f() {
>>   char *p = obj + 0x79999999;
>>   char *q = obj + 0x80000000;
>>   *q = 1;
>>   memcpy(p, "abcd", 4);
>>   return *q;
>> }
>>
>>
>> Clearly the stores alias, and the memcpy should override the value 
>> written by "*q = 1".
>>
>> I dunno if this is legal in C or not, but the IR produced by clang
looks
>> like (32 bits):
>>
>> @obj = common global [2147483656 x i8] zeroinitializer, align 1
>>
>> define signext i8 @f() {
>>   store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr
inbounds
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
>> i32 -2147483648), align 1
>>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds 
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465),
i8*
>> getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i32
4,
>> i32 1, i1 false)
>>   %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr 
>> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
>> i32 -2147483648), align 1
>>   ret i8 %1
>> }
>>
>> With -O2, the store to q gets forwarded, and so we get "ret i8
1".
>> So, BasicAA concluded that p and q don't alias. The culprit is an 
>> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
>>
>> So my question is do we care about this use case where a single 
>> allocation can take more than half of the address space?
>>
>
> Accoding to LangRef, your IR currently has undefined behavior: the rules 
> for "inbounds" GEPs say that indexes are treated as signed
values.  And
> solving that would involve changing the way we represent GEPs in IR, so I 
> think you can consider that out of scope.
Sorry, that was a typo. The test case was supposed to not have inbounds (it 
should work without as well).
The current definition of GEP inbounds is complicated, though.. It disallows 
the following:
%a = gep %p, 0x88888888
%b = gep inbounds %a, 1

If %a is within bounds, the "gep inbounds" gives a signed overflow
even
though it's just a +1  (since 0x88888888 + 1 overflows).
So GEP inbounds disables large objects outright.

BTW I've always wondered why EmitGEPOffset 
(http://llvm.org/doxygen/Local_8h_source.html#l00247) doesn't use 'add
nsw'
if the semantics of GEP inbounds allows that (if my reading of LangRef is 
correct).

> Assuming we're not dealing with inbounds GEPs (e.g. you pass -fwrapv to
> clang), I don't see any particular reason to disallow allocations more 
> than half the address-space.
Ok, I can file bug reports for the cases I'm seeing.  I can verify 
correctness of fixes as well.  But only starting in a week from now; I'm 
quite busy at the moment.

Nuno

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Nov 2017 - Is it ok to allocate > half of address space?

[llvm-dev] Is it ok to allocate > half of address space?

[llvm-dev] Is it ok to allocate > half of address space?

[llvm-dev] Is it ok to allocate > half of address space?

[llvm-dev] Is it ok to allocate > half of address space?

[llvm-dev] Is it ok to allocate > half of address space?

[llvm-dev] Is it ok to allocate > half of address space?

Maybe Matching Threads