Arnold Schwaighofer
2013-Feb-27 15:41 UTC
[LLVMdev] Question about intrinsic function llvm.objectsize
On Feb 27, 2013, at 4:05 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote:> Hi, > > Regarding the definition of object for @llvm.objectsize, it is identical to gcc's __builtin_object_size(). So it's not wrong; it's just the way it was defined to be. > > Regarding the BasicAA's usage of these functions, I'm unsure. It seems to me that isObjectSmallerThan() also expects the same definition, but I didn't review the code carefully. > When you do a load from a certain memory address, basicAA is interested to know if that load will overflow the buffer bounds, which means that no aliasing can occur (or it's an UB operation). > Therefore I don't think basicAA cares about the size of the whole object, but just the remaining part of it (size-offset). But again, I could be wrong here.No, it cares about the size of the whole object in the code quoted below. const Value *O1 = GetUnderlyingObject(V1, TD); But that is fine. See answers below.> > > Quoting Shuxin Yang <shuxin.llvm at gmail.com>: > >> Hi, >> >> In the following instruction sequence, llvm.objectsize.i64(p) returns 6 (the entire *.ll is attached to the mail). >> Is this correct? Shouldn't the "object" refer to the entire block of memory being allocated? >>No, the ""llvm.objectsize intrinsic" is defined as the size of the object pointed to by the first argument, in your case "p+50". To implement the llvm.objectsize intrinsic we call getObjectSize on the "p+50" pointer. See http://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html for intended use.>> (char*) p = malloc(56) >> llvm.objectisize.i32(p+50); >> >> Thanks >> Shuxin >> >> >> >> This question is related to PR14988 (failure in bootstrap build with LTO). >> Part of the reason is that >> the compiler interpret the pretty vague team "object" in different ways -- the "object" suggested by >> line 359 @ figure 1 is just a part of the "object" implied by Figure 2. >>Well, it depends on which object you are talking about an object pointed to by a pointer or an "underlying object". Both are valid things to talk about in varying contexts.>> I try to fix the problem PR14988 by replacing line 359 with "Size = Offset.getZextValue()"; It does fix the problem.\That does not sound right. The size of an object pointed to by a pointer is the size of the underlying object minus any offsets.>> However, I'm wondering in what situations the Offset != 0. So I put an assertion right before line 359,The offset is not null if the "Ptr" passed does not point to an underlying object.>> saying "assert(Offset != 0)". It catches two cases in SingleSource test suite. I investigate one of them, >> the assertion is triggered when the llvm::getObjectSize() is called by instcombine (instead of alias analyzer)Yes, this observation makes sense. In this case there is an offset (you are passing "ptr+offset") and you want this behavior (llvm.objectsize is defined as such). See below.>> in an attempt to replace llvm.objectsize() with a constant. I think the way llvm::getObjectSize() interpret "object" is wrong. >>No, it is behaving as expected depending on where you call it from. See below.>> >> Figure 1 >> cat -n lib/Analysis/MemoryBuiltins.cpp >> 344 bool llvm::getObjectSize(const Value *Ptr, uint64_t &Size, const DataLayout *TD, >> 345 const TargetLibraryInfo *TLI, bool RoundToAlign) { >> 346 if (!TD) >> 347 return false; >> 348 >> 349 ObjectSizeOffsetVisitor Visitor(TD, TLI, Ptr->getContext(), RoundToAlign); >> 350 SizeOffsetType Data = Visitor.compute(const_cast<Value*>(Ptr)); >> 351 if (!Visitor.bothKnown(Data)) >> 352 return false; >> 353 >> 354 APInt ObjSize = Data.first, Offset = Data.second; >> 355 // check for overflow >> 356 if (Offset.slt(0) || ObjSize.ult(Offset)) >> 357 Size = 0; >> 358 else >> 359 Size = (ObjSize - Offset).getZExtValue(); ??? What the hack is "object"?????? >> 360 return true; >> 361 } >>The object is the object pointed to by "Ptr"; depending on where you call the "getObjectSize" function from these might be different things: an underlying object or an object "based on" (http://llvm.org/docs/LangRef.html#pointer-aliasing-rules) an underlying object. In the "llvm.objectsize" context we pass an object "based on p" to getObjectSize: "p+50". In the basicaa context, we wanna know whether an access is beyond the bounds of an underlying object (undefined behavior land) so we pass the underlying object (which in your example would be the "p" returned from malloc) to the getObjectSize function. In the first case (passing "p+50" to getObjectSize) ObjSize should be 56 and the Offset will be 50 yielding 6 in the second case your ObjSize will be 56 and the offset is zero because basicaa passed the underlying object "p". I have to agree though the term "object" is somewhat confusing.>> Figure 2 >> cat -n lib/Analysis/BasicAliasAnalysis.cpp >> 1205 // If the size of one access is larger than the entire object on the other >> 1206 // side, then we know such behavior is undefined and can assume no alias. >> 1207 if (TD) >> 1208 if ((V1Size != UnknownSize && isObjectSmallerThan(O2, V1Size, *TD, *TLI)) || >> 1209 (V2Size != UnknownSize && isObjectSmallerThan(O1, V2Size, *TD, *TLI))) >> 1210 return NoAlias;Best, Arnold
Shuxin Yang
2013-Feb-27 18:37 UTC
[LLVMdev] Question about intrinsic function llvm.objectsize
Hi, Nuno and Arnold: Thank you all for the input. Let me coin a term, say "clique" for this discussion to avoid unnecessary confusion. A clique is statically or dynamically allocated type-free stretch of memory. A "clique" 1) is maximal in the sense that a clique dose not have any enclosing data structure that can completely cover or, partially overlap with it. e.g1 x = malloc(100) the clique has 100 byte e.g. foo( T t; xxxx = t.y.z; } // the corresponding clique accessed by the load is t of sizeof(T). e.g . bar(T* t) { xxxx = t->y.z; } // we have no idea which clique the load is accessing as we don't // know if the *t has enclosing data structure or not. 2). a clique's address cannot be derived from another clique. This feature can be used for disambiguation. Consider two access, one access's corresponding clique has N byte, the other one has M byte, where M > N. It is impossible that these accesses alias. .... In llvm, the "object pointed to by ptr p" seems to take on two meanings. Suppose the corresponding clique of the "object", regardless which context it is in, is C. a): the object refers to the stretch of memory staring from p to the end of the C. b): the entire clique C. In the context of llvm.objectsize(). As you guys pointed out, the "object" take meaning of a). If would not be useful if it take meaning b) as the compiler is not able to use this info to convert run-time testing into a static test. In the context of memory disambiguation, the "object" take meaning of b) in order to take advantage of clique's 2nd feature. Following is the real example I'm investigating where the "object" take on meaning b) leads to miscompilation. ----------------------------------------- struct { T1* p1, T2 *p2, int xyz} mydata; addr1 = phi(&mydata, &mydata); // all operands have same value. addr2 = phi(&mydata.xyz, &mydata.xyz); memcpy(addr1, ... sizeof(mydata)) // mem1 = load addr2->syz // mem2 -------------------------------------------------------- ( But for the phi node, getUnderlyingObject() would return "mydata" as "object" for both memory accesses. ) getObjectSize(mem2) return 4 instead of sizeof(mydata). Now that the "object" only have 4 bytes, the "object" of another access, which has sizeof(mydata) byte, should not cover this object, and hence mistakenly give "noalias". On 2/27/13 7:41 AM, Arnold Schwaighofer wrote:> On Feb 27, 2013, at 4:05 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote: > >> Hi, >> >> Regarding the definition of object for @llvm.objectsize, it is identical to gcc's __builtin_object_size(). So it's not wrong; it's just the way it was defined to be. >> >> Regarding the BasicAA's usage of these functions, I'm unsure. It seems to me that isObjectSmallerThan() also expects the same definition, but I didn't review the code carefully. >> When you do a load from a certain memory address, basicAA is interested to know if that load will overflow the buffer bounds, which means that no aliasing can occur (or it's an UB operation). >> Therefore I don't think basicAA cares about the size of the whole object, but just the remaining part of it (size-offset). But again, I could be wrong here. > No, it cares about the size of the whole object in the code quoted below. > const Value *O1 = GetUnderlyingObject(V1, TD); > > But that is fine. See answers below. > >> >> Quoting Shuxin Yang <shuxin.llvm at gmail.com>: >> >>> Hi, >>> >>> In the following instruction sequence, llvm.objectsize.i64(p) returns 6 (the entire *.ll is attached to the mail). >>> Is this correct? Shouldn't the "object" refer to the entire block of memory being allocated? >>> > No, the ""llvm.objectsize intrinsic" is defined as the size of the object pointed to by the first argument, in your case "p+50". > To implement the llvm.objectsize intrinsic we call getObjectSize on the "p+50" pointer. > > See http://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html for intended use. > >>> (char*) p = malloc(56) >>> llvm.objectisize.i32(p+50); >>> >>> Thanks >>> Shuxin >>> >>> >>> >>> This question is related to PR14988 (failure in bootstrap build with LTO). >>> Part of the reason is that >>> the compiler interpret the pretty vague team "object" in different ways -- the "object" suggested by >>> line 359 @ figure 1 is just a part of the "object" implied by Figure 2. >>> > Well, it depends on which object you are talking about an object pointed to by a pointer or an "underlying object". Both are valid things to talk about in varying contexts. > >>> I try to fix the problem PR14988 by replacing line 359 with "Size = Offset.getZextValue()"; It does fix the problem.\ > That does not sound right. The size of an object pointed to by a pointer is the size of the underlying object minus any offsets. > >>> However, I'm wondering in what situations the Offset != 0. So I put an assertion right before line 359, > The offset is not null if the "Ptr" passed does not point to an underlying object. > >>> saying "assert(Offset != 0)". It catches two cases in SingleSource test suite. I investigate one of them, >>> the assertion is triggered when the llvm::getObjectSize() is called by instcombine (instead of alias analyzer) > Yes, this observation makes sense. In this case there is an offset (you are passing "ptr+offset") and you want this behavior (llvm.objectsize is defined as such). See below. >>> in an attempt to replace llvm.objectsize() with a constant. I think the way llvm::getObjectSize() interpret "object" is wrong. >>> > No, it is behaving as expected depending on where you call it from. See below. >>> Figure 1 >>> cat -n lib/Analysis/MemoryBuiltins.cpp >>> 344 bool llvm::getObjectSize(const Value *Ptr, uint64_t &Size, const DataLayout *TD, >>> 345 const TargetLibraryInfo *TLI, bool RoundToAlign) { >>> 346 if (!TD) >>> 347 return false; >>> 348 >>> 349 ObjectSizeOffsetVisitor Visitor(TD, TLI, Ptr->getContext(), RoundToAlign); >>> 350 SizeOffsetType Data = Visitor.compute(const_cast<Value*>(Ptr)); >>> 351 if (!Visitor.bothKnown(Data)) >>> 352 return false; >>> 353 >>> 354 APInt ObjSize = Data.first, Offset = Data.second; >>> 355 // check for overflow >>> 356 if (Offset.slt(0) || ObjSize.ult(Offset)) >>> 357 Size = 0; >>> 358 else >>> 359 Size = (ObjSize - Offset).getZExtValue(); ??? What the hack is "object"?????? >>> 360 return true; >>> 361 } >>> > The object is the object pointed to by "Ptr"; depending on where you call the "getObjectSize" function from these might be different things: an underlying object or an object "based on" (http://llvm.org/docs/LangRef.html#pointer-aliasing-rules) an underlying object. > > In the "llvm.objectsize" context we pass an object "based on p" to getObjectSize: "p+50". In the basicaa context, we wanna know whether an access is beyond the bounds of an underlying object (undefined behavior land) so we pass the underlying object (which in your example would be the "p" returned from malloc) to the getObjectSize function. > > In the first case (passing "p+50" to getObjectSize) ObjSize should be 56 and the Offset will be 50 yielding 6 in the second case your ObjSize will be 56 and the offset is zero because basicaa passed the underlying object "p". > > I have to agree though the term "object" is somewhat confusing. > >>> Figure 2 >>> cat -n lib/Analysis/BasicAliasAnalysis.cpp >>> 1205 // If the size of one access is larger than the entire object on the other >>> 1206 // side, then we know such behavior is undefined and can assume no alias. >>> 1207 if (TD) >>> 1208 if ((V1Size != UnknownSize && isObjectSmallerThan(O2, V1Size, *TD, *TLI)) || >>> 1209 (V2Size != UnknownSize && isObjectSmallerThan(O1, V2Size, *TD, *TLI))) >>> 1210 return NoAlias; > Best, > Arnold
Shuxin Yang
2013-Feb-27 18:53 UTC
[LLVMdev] Question about intrinsic function llvm.objectsize
> In the "llvm.objectsize" context we pass an object "based on p" to getObjectSize: "p+50". In the basicaa context, we wanna know whether an access is beyond the bounds of an underlying object (undefined behavior land) so we pass the underlying object (which in your example would be the "p" returned from malloc) to the getObjectSize function. > > In the first case (passing "p+50" to getObjectSize) ObjSize should be 56 and the Offset will be 50 yielding 6 in the second case your ObjSize will be 56 and the offset is zero because basicaa passed the underlying object "p".you figured out an alternative to fix the problem. In the context of alias analysis, it is up to the caller to pass the base addr of the "object" to getObjectSize() by calling getUnderlyingObject(). However, if the base-addr is bit complicated, say, one needs to go through U-D chain (including phi node). In this situation, a helper class ObjectSizeOffsetVisitor will help. My take is to implement another function, call getEntireObjectSize(p) which returns the size of the entire object no matter where the p is pointing to. How does this sound to you? Thanks
Shuxin Yang
2013-Feb-27 18:57 UTC
[LLVMdev] Question about intrinsic function llvm.objectsize
I take one more look of the code, seems like we have to implement getSizeOfEntireObject(p). The reason is the that we have no idea if the value returned from getUnderlyingObject() points to a legal object or not, On 2/27/13 10:53 AM, Shuxin Yang wrote:> >> In the "llvm.objectsize" context we pass an object "based on p" to >> getObjectSize: "p+50". In the basicaa context, we wanna know whether >> an access is beyond the bounds of an underlying object (undefined >> behavior land) so we pass the underlying object (which in your >> example would be the "p" returned from malloc) to the getObjectSize >> function. >> >> In the first case (passing "p+50" to getObjectSize) ObjSize should be >> 56 and the Offset will be 50 yielding 6 in the second case your >> ObjSize will be 56 and the offset is zero because basicaa passed the >> underlying object "p". > you figured out an alternative to fix the problem. In the context of > alias analysis, it is up to the caller to pass > the base addr of the "object" to getObjectSize() by calling > getUnderlyingObject(). > > However, if the base-addr is bit complicated, say, one needs to go > through U-D chain (including phi node). > In this situation, a helper class ObjectSizeOffsetVisitor will help. > > My take is to implement another function, call getEntireObjectSize(p) > which returns the size of the entire object > no matter where the p is pointing to. How does this sound to you? > > Thanks > >
Arnold Schwaighofer
2013-Feb-27 19:21 UTC
[LLVMdev] Question about intrinsic function llvm.objectsize
On Feb 27, 2013, at 12:37 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:> Hi, Nuno and Arnold: > > Thank you all for the input. > > Let me coin a term, say "clique" for this discussion to avoid unnecessary confusion. > A clique is statically or dynamically allocated type-free stretch of memory. A "clique" > 1) is maximal in the sense that a clique dose not have any enclosing data structure that can > completely cover or, partially overlap with it. > e.g1 x = malloc(100) > the clique has 100 byte > > e.g. foo( T t; xxxx = t.y.z; } // the corresponding clique accessed by the load is t of sizeof(T). > e.g . bar(T* t) { xxxx = t->y.z; } // we have no idea which clique the load is accessing as we don't > // know if the *t has enclosing data structure or not. > > 2). a clique's address cannot be derived from another clique. > This feature can be used for disambiguation. Consider two access, one access's corresponding > clique has N byte, the other one has M byte, where M > N. It is impossible that these accesses alias. > ….This is not how LLVM disambiguates in the basicaa code fragment you quote below. The code fragment asserts that something can be assumed "no alias" if the access size of an memory access (i.e. a load or store) is bigger than the underlying object of the other accessed object. For you example below: V1Size = sizeof(mydata) V2Size = say 4 assuming that GetUnderlyingObject for addr1 and addr2 returns mydata: getObjectSize(O1) = sizeof(mydata) getObjectSize(O2) = sizeof(mydata) this should be isObjectSmallerThan(O2, V1Size) == false isObjectSmallerThan(O1, V2Size) == false ??> > In llvm, the "object pointed to by ptr p" seems to take on two meanings. Suppose the corresponding > clique of the "object", regardless which context it is in, is C. > a): the object refers to the stretch of memory staring from p to the end of the C. > b): the entire clique C. > > In the context of llvm.objectsize(). As you guys pointed out, the "object" take meaning of a). > If would not be useful if it take meaning b) as the compiler is not able to use this info to convert > run-time testing into a static test. > > In the context of memory disambiguation, the "object" take meaning of b) in order to take advantage of > clique's 2nd feature. > > Following is the real example I'm investigating where the "object" take on meaning b) leads to miscompilation. > > ----------------------------------------- > struct { T1* p1, T2 *p2, int xyz} mydata; > > addr1 = phi(&mydata, &mydata); // all operands have same value. > addr2 = phi(&mydata.xyz, &mydata.xyz); > > memcpy(addr1, ... sizeof(mydata)) // mem1 > = load addr2->syz // mem2 > -------------------------------------------------------- > > ( But for the phi node, getUnderlyingObject() would return "mydata" as "object" for both memory accesses. ) > getObjectSize(mem2) return 4 instead of sizeof(mydata). Now that the "object" only have 4 bytes, > the "object" of another access, which has sizeof(mydata) byte, should not cover this object, and hence > mistakenly give "noalias". > > > > > > On 2/27/13 7:41 AM, Arnold Schwaighofer wrote: >> On Feb 27, 2013, at 4:05 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote: >> >>> Hi, >>> >>> Regarding the definition of object for @llvm.objectsize, it is identical to gcc's __builtin_object_size(). So it's not wrong; it's just the way it was defined to be. >>> >>> Regarding the BasicAA's usage of these functions, I'm unsure. It seems to me that isObjectSmallerThan() also expects the same definition, but I didn't review the code carefully. >>> When you do a load from a certain memory address, basicAA is interested to know if that load will overflow the buffer bounds, which means that no aliasing can occur (or it's an UB operation). >>> Therefore I don't think basicAA cares about the size of the whole object, but just the remaining part of it (size-offset). But again, I could be wrong here. >> No, it cares about the size of the whole object in the code quoted below. >> const Value *O1 = GetUnderlyingObject(V1, TD); >> >> But that is fine. See answers below. >> >>> >>> Quoting Shuxin Yang <shuxin.llvm at gmail.com>: >>> >>>> Hi, >>>> >>>> In the following instruction sequence, llvm.objectsize.i64(p) returns 6 (the entire *.ll is attached to the mail). >>>> Is this correct? Shouldn't the "object" refer to the entire block of memory being allocated? >>>> >> No, the ""llvm.objectsize intrinsic" is defined as the size of the object pointed to by the first argument, in your case "p+50". >> To implement the llvm.objectsize intrinsic we call getObjectSize on the "p+50" pointer. >> >> See http://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html for intended use. >> >>>> (char*) p = malloc(56) >>>> llvm.objectisize.i32(p+50); >>>> >>>> Thanks >>>> Shuxin >>>> >>>> >>>> >>>> This question is related to PR14988 (failure in bootstrap build with LTO). >>>> Part of the reason is that >>>> the compiler interpret the pretty vague team "object" in different ways -- the "object" suggested by >>>> line 359 @ figure 1 is just a part of the "object" implied by Figure 2. >>>> >> Well, it depends on which object you are talking about an object pointed to by a pointer or an "underlying object". Both are valid things to talk about in varying contexts. >> >>>> I try to fix the problem PR14988 by replacing line 359 with "Size = Offset.getZextValue()"; It does fix the problem.\ >> That does not sound right. The size of an object pointed to by a pointer is the size of the underlying object minus any offsets. >> >>>> However, I'm wondering in what situations the Offset != 0. So I put an assertion right before line 359, >> The offset is not null if the "Ptr" passed does not point to an underlying object. >> >>>> saying "assert(Offset != 0)". It catches two cases in SingleSource test suite. I investigate one of them, >>>> the assertion is triggered when the llvm::getObjectSize() is called by instcombine (instead of alias analyzer) >> Yes, this observation makes sense. In this case there is an offset (you are passing "ptr+offset") and you want this behavior (llvm.objectsize is defined as such). See below. >>>> in an attempt to replace llvm.objectsize() with a constant. I think the way llvm::getObjectSize() interpret "object" is wrong. >>>> >> No, it is behaving as expected depending on where you call it from. See below. >>>> Figure 1 >>>> cat -n lib/Analysis/MemoryBuiltins.cpp >>>> 344 bool llvm::getObjectSize(const Value *Ptr, uint64_t &Size, const DataLayout *TD, >>>> 345 const TargetLibraryInfo *TLI, bool RoundToAlign) { >>>> 346 if (!TD) >>>> 347 return false; >>>> 348 >>>> 349 ObjectSizeOffsetVisitor Visitor(TD, TLI, Ptr->getContext(), RoundToAlign); >>>> 350 SizeOffsetType Data = Visitor.compute(const_cast<Value*>(Ptr)); >>>> 351 if (!Visitor.bothKnown(Data)) >>>> 352 return false; >>>> 353 >>>> 354 APInt ObjSize = Data.first, Offset = Data.second; >>>> 355 // check for overflow >>>> 356 if (Offset.slt(0) || ObjSize.ult(Offset)) >>>> 357 Size = 0; >>>> 358 else >>>> 359 Size = (ObjSize - Offset).getZExtValue(); ??? What the hack is "object"?????? >>>> 360 return true; >>>> 361 } >>>> >> The object is the object pointed to by "Ptr"; depending on where you call the "getObjectSize" function from these might be different things: an underlying object or an object "based on" (http://llvm.org/docs/LangRef.html#pointer-aliasing-rules) an underlying object. >> >> In the "llvm.objectsize" context we pass an object "based on p" to getObjectSize: "p+50". In the basicaa context, we wanna know whether an access is beyond the bounds of an underlying object (undefined behavior land) so we pass the underlying object (which in your example would be the "p" returned from malloc) to the getObjectSize function. >> >> In the first case (passing "p+50" to getObjectSize) ObjSize should be 56 and the Offset will be 50 yielding 6 in the second case your ObjSize will be 56 and the offset is zero because basicaa passed the underlying object "p". >> >> I have to agree though the term "object" is somewhat confusing. >> >>>> Figure 2 >>>> cat -n lib/Analysis/BasicAliasAnalysis.cpp >>>> 1205 // If the size of one access is larger than the entire object on the other >>>> 1206 // side, then we know such behavior is undefined and can assume no alias. >>>> 1207 if (TD) >>>> 1208 if ((V1Size != UnknownSize && isObjectSmallerThan(O2, V1Size, *TD, *TLI)) || >>>> 1209 (V2Size != UnknownSize && isObjectSmallerThan(O1, V2Size, *TD, *TLI))) >>>> 1210 return NoAlias; >> Best, >> Arnold >
Reasonably Related Threads
- [LLVMdev] Question about intrinsic function llvm.objectsize
- [LLVMdev] Question about intrinsic function llvm.objectsize
- [LLVMdev] Question about intrinsic function llvm.objectsize
- [LLVMdev] Question about intrinsic function llvm.objectsize
- [LLVMdev] Question about intrinsic function llvm.objectsize