Hello Eugene, The underlying problem here is that your reasoning below follows C rules, while BasicAA and other LLVM passes must follow LLVM IR rules, which are different in this area. In LLVM IR, memory has no types. Translating from C to LLVM IR does not preserve this information. In addition to being the reason BasicAA can't reason about array bounds and struct members as you describe, it's also the reason BasicAA can't do type-based alias analysis. The expected solution for type-based alias analysis uses metadata to describe additional guarantees that C makes. In a similar manner, metadata could perhaps be used to describe C array bounds and struct member rules. This overall design has tradeoffs; there are advantages as well as disadvantages. Dan On Jun 26, 2010, at 11:55 AM, Eugene Toder wrote:> John, David, > > That's not the whole story, though. For example, 6.7.2.1/13 of C99 > allows a conversion between a pointer to struct and it's first member > and also specifies that members of the struct are laid out in the > order of declaration. To derive pointer to one field from another it's > sufficient to 1) cast pointer to a struct to a pointer to array of > chars 2) perform arithmetics as per 6.5.6 3) cast the pointer to > pointer to a field. I have not found paragraphs explicitly allowing or > denying 1 and 3, however there are many hints -- offsetof macro, which > would have no use otherwise; 6.2.6/4, which allows to copy any object > into array of chars; aliasing rules of 6.5, which allow accessing any > object through char*. > > On the original subject -- in this example, isn't it guaranteed that > _length and _data[i] do not alias? i is non-negative (if it's unsigned > -- by definition, if it's signed -- because signed overflow is > undefined behaviour), so &_data[i] >= &_data > &_length. > > Eugene > > On Thu, Jun 17, 2010 at 6:58 PM, John McCall <rjmccall at apple.com> wrote: >> >> On Jun 17, 2010, at 10:42 AM, Andrew Haley wrote: >> >> On 06/17/2010 06:19 PM, Jeffrey Yasskin wrote: >> >> On Thu, Jun 17, 2010 at 12:22 AM, Eli Friedman <eli.friedman at gmail.com> >> wrote: >> >> On Wed, Jun 16, 2010 at 11:14 PM, Pierre C <lists at peufeu.com> wrote: >> >> There are essentially two ways to "solve" this issue: one is >> >> type-based alias analysis, i.e. assuming "double" and "int" don't >> >> alias; LLVM doesn't implement this at the moment. The other is to >> >> attempt to analyze the loop and prove that %indvar.i is never >> >> negative; LLVM doesn't implement anything like this at the moment >> >> either. >> >> -Eli >> >> Actually I think it's much simpler than that... >> >> http://llvm.org/releases/1.3/docs/AliasAnalysis.html#basic-aa >> >> it says says "Different fields of a structure do not alias." >> >> This is the case here : we have two different fields of a struct however it >> >> mistakenly thinks they alias. >> >> Consider a case like the following: >> >> struct X { int a; int b[10]; }; >> >> int f(struct X* a) { a->b[-1] = 1; return a->a; } >> >> This is technically illegal code, but various programs depend on >> >> constructs like this working. >> >> I don't know if it's illegal, but this is how libstdc++'s string >> >> implementation finds its header data. std::string stores a pointer >> >> directly to the character data (making subscripting slightly faster), >> >> and then subtracts the size of the header when it needs to do any >> >> bookkeeping. >> >> Character types are special: they can alias everything. if this weren't >> the case you couldn't write malloc(). >> >> Do you have a reference to the standard that makes it undefined? >> >> Several places, but 6.3.2.3 of C99 is the most important: it lists all >> the legal pointer conversions. >> >> The better reference is 6.5.6p7-8, about pointer addition: >> For the purposes of these operators, a pointer to an object that is not an >> element of an array behaves the same as a pointer to the first element of an >> array of length one with the type of the object as its element type. >> If both the pointer operand and the result point to elements of the same >> array object, or one past the last element of the array object, the >> evaluation shall not produce an overflow; otherwise, the behavior is >> undefined. >> John. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Dan, Are you referring to my reasoning for why _length and _data[i] do not alias? I don't think this needs TBAA or any "strict" aliasing rules. All that sufficient is 1) assumption about struct layout: offsetof(_length) < offsetof(_data) 2) assumption that i >= 0. My understanding is that 1) is guaranteed by llvm rules and 2) by C rules, however I'm not sure how to express C rules in llvm IR. For signed types "nsw" flag on arithmetic seems close, if I understand the description of trap values. For unsigned types arithmetic is fine, but we probably need "unsigned" flag in getelementptr to signal that offset is treated as unsigned. In fact, as Takumi pointed out, if you add an extra comparison to ensure 2) SCEV can already optimize the loop. Or am I missing something? Eugene On Sun, Jun 27, 2010 at 9:49 PM, Dan Gohman <gohman at apple.com> wrote:> Hello Eugene, > > The underlying problem here is that your reasoning below follows > C rules, while BasicAA and other LLVM passes must follow > LLVM IR rules, which are different in this area. In LLVM IR, memory > has no types. Translating from C to LLVM IR does not preserve > this information. > > In addition to being the reason BasicAA can't reason about array > bounds and struct members as you describe, it's also the reason > BasicAA can't do type-based alias analysis. > > The expected solution for type-based alias analysis uses > metadata to describe additional guarantees that C makes. In a > similar manner, metadata could perhaps be used to describe C > array bounds and struct member rules. > > This overall design has tradeoffs; there are advantages as well > as disadvantages. > > Dan > > On Jun 26, 2010, at 11:55 AM, Eugene Toder wrote: > >> John, David, >> >> That's not the whole story, though. For example, 6.7.2.1/13 of C99 >> allows a conversion between a pointer to struct and it's first member >> and also specifies that members of the struct are laid out in the >> order of declaration. To derive pointer to one field from another it's >> sufficient to 1) cast pointer to a struct to a pointer to array of >> chars 2) perform arithmetics as per 6.5.6 3) cast the pointer to >> pointer to a field. I have not found paragraphs explicitly allowing or >> denying 1 and 3, however there are many hints -- offsetof macro, which >> would have no use otherwise; 6.2.6/4, which allows to copy any object >> into array of chars; aliasing rules of 6.5, which allow accessing any >> object through char*. >> >> On the original subject -- in this example, isn't it guaranteed that >> _length and _data[i] do not alias? i is non-negative (if it's unsigned >> -- by definition, if it's signed -- because signed overflow is >> undefined behaviour), so &_data[i] >= &_data > &_length. >> >> Eugene >> >> On Thu, Jun 17, 2010 at 6:58 PM, John McCall <rjmccall at apple.com> wrote: >>> >>> On Jun 17, 2010, at 10:42 AM, Andrew Haley wrote: >>> >>> On 06/17/2010 06:19 PM, Jeffrey Yasskin wrote: >>> >>> On Thu, Jun 17, 2010 at 12:22 AM, Eli Friedman <eli.friedman at gmail.com> >>> wrote: >>> >>> On Wed, Jun 16, 2010 at 11:14 PM, Pierre C <lists at peufeu.com> wrote: >>> >>> There are essentially two ways to "solve" this issue: one is >>> >>> type-based alias analysis, i.e. assuming "double" and "int" don't >>> >>> alias; LLVM doesn't implement this at the moment. The other is to >>> >>> attempt to analyze the loop and prove that %indvar.i is never >>> >>> negative; LLVM doesn't implement anything like this at the moment >>> >>> either. >>> >>> -Eli >>> >>> Actually I think it's much simpler than that... >>> >>> http://llvm.org/releases/1.3/docs/AliasAnalysis.html#basic-aa >>> >>> it says says "Different fields of a structure do not alias." >>> >>> This is the case here : we have two different fields of a struct however it >>> >>> mistakenly thinks they alias. >>> >>> Consider a case like the following: >>> >>> struct X { int a; int b[10]; }; >>> >>> int f(struct X* a) { a->b[-1] = 1; return a->a; } >>> >>> This is technically illegal code, but various programs depend on >>> >>> constructs like this working. >>> >>> I don't know if it's illegal, but this is how libstdc++'s string >>> >>> implementation finds its header data. std::string stores a pointer >>> >>> directly to the character data (making subscripting slightly faster), >>> >>> and then subtracts the size of the header when it needs to do any >>> >>> bookkeeping. >>> >>> Character types are special: they can alias everything. if this weren't >>> the case you couldn't write malloc(). >>> >>> Do you have a reference to the standard that makes it undefined? >>> >>> Several places, but 6.3.2.3 of C99 is the most important: it lists all >>> the legal pointer conversions. >>> >>> The better reference is 6.5.6p7-8, about pointer addition: >>> For the purposes of these operators, a pointer to an object that is not an >>> element of an array behaves the same as a pointer to the first element of an >>> array of length one with the type of the object as its element type. >>> If both the pointer operand and the result point to elements of the same >>> array object, or one past the last element of the array object, the >>> evaluation shall not produce an overflow; otherwise, the behavior is >>> undefined. >>> John. >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On Jun 27, 2010, at 2:26 PM, Eugene Toder wrote:> Hi Dan, > > Are you referring to my reasoning for why _length and _data[i] do not > alias?No, I was referring to the discussion of C99 6.7.2.1, 6.5.6, 6.2.6, and so on.> I don't think this needs TBAA or any "strict" aliasing rules. > All that sufficient is 1) assumption about struct layout: > offsetof(_length) < offsetof(_data) 2) assumption that i >= 0. > My understanding is that 1) is guaranteed by llvm rules and 2) by C > rules, however I'm not sure how to express C rules in llvm IR. For > signed types "nsw" flag on arithmetic seems close, if I understand the > description of trap values.Right; this is the second of the two approaches Eli originally suggested. For this approach, the IR is already sufficient to allow the optimizer to prove that i >=0 and to subsequently prove that the relevant pointers don't alias. BasicAA doesn't have any logic related to proving either i >= 0, or that _length and _data don't alias even if it somehow knew i >= 0. It could be taught both of those things, if someone were interested. SCEV-AA does have logic related to proving that i >= 0, but it happens to fail in this testcase; I haven't investigated it in detail. It also has logic for proving that _length and _data don't alias if i >= 0. However, SCEV-AA is not enabled by default.> For unsigned types arithmetic is fine, but > we probably need "unsigned" flag in getelementptr to signal that > offset is treated as unsigned. > > In fact, as Takumi pointed out, if you add an extra comparison to > ensure 2) SCEV can already optimize the loop. > Or am I missing something?With that extra comparison, SCEV-AA succeeds in proving that i >=0, and this shows it can do everything else from there. Dan