> glibc does accept patches...or are you talking about two separate instances of this problem, both in glibc and something else?I originally saw this in a benchmark (which it may be possible to get changed) and only afterward found the glibc idiom. The most recent glibc code is a bit more complicated than I represented below. If you look up obstack.h you can see what’s there now. Basically, they’re computing a pointer alignment that may be based on some non-null base pointer or may be based on null, depending on the target architecture. I’m sure it’s possible to make it more sensible, but I’ve never interacted with the glibc community so I’m not sure how open they would be to changing this. I suspect I’d need a more compelling argument than clang-compatibility (or maybe even standards compatibility). From: James Y Knight [mailto:jyknight at google.com] Sent: Thursday, July 06, 2017 12:06 PM To: Kaylor, Andrew <andrew.kaylor at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] GEP with a null pointer base On Thu, Jul 6, 2017 at 2:06 PM, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: p = (char*)0 + n where 'n' is an intptr_t-sized value that the program knows is actually a valid address for a pointer. clang translates this as %p = getelementptr inbounds i8, i8* null, i64 %n So far, so good. The problem is that while LLVM seems to consider the above IR to be valid, we officially do not allow dereferencing a pointer constructed in this way (if I’m reading the rules correctly). Consequently, if this GEP ever gets close enough to a load using the pointer, InstCombine will eliminate the GEP and the load. I've been told that the '(char*)0 + n' construct is invalid according to the C standard. However, this pattern appears in the glibc malloc implementation, so I'd like to be able to handle it anyway. glibc does accept patches...or are you talking about two separate instances of this problem, both in glibc and something else? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/47241897/attachment.html>
* Andrew via llvm-dev Kaylor:> The most recent glibc code is a bit more complicated than I > represented below. If you look up obstack.h you can see what’s there > now. Basically, they’re computing a pointer alignment that may be > based on some non-null base pointer or may be based on null, depending > on the target architecture. I’m sure it’s possible to make it more > sensible, but I’ve never interacted with the glibc community so I’m > not sure how open they would be to changing this. I suspect I’d need > a more compelling argument than clang-compatibility (or maybe even > standards compatibility).Not at all, we are fine with making installed headers more palatable to a wider range of compilers. That being said, I find obstacks rather horrible and wouldn't be surprised if they can't be implemented in a standard-conforming manner. Strictly speaking, you can't take a pointer, convert it to (u)intptr_t, twiddle some bits for alignment purposes, and cast it back to get another object pointer. But it seems to me that obstacks go beyond that in terms of C language abuse. The offset relative to NULL was probably needed for some old compilers. We should rewrite this using uintptr_t and __alignof__. If anyone wants to work on this and change the macros, we'd definitely review a patch submission (if it is short enough or is backed by a copyright assignment, sorry).
So, I actually think there's two separate parts of this. 1. Pointer math is required to stay within an object if the GEP has the "inbounds" attribute. Nothing is inbounds of the null pointer, so a GEP instruction with a null pointer, and non-zero offsets, is invalid on its face, even without the load. Now, you can tell clang to stop putting inbounds on pointer arithmetic GEPs by using -fwrapv or -fno-strict-overflow. Doing that doesn't fix the issue, because it's not actually the GEP that's being optimized in this particular case. 2. Separately, LLVM's pointer aliasing rules prohibit (even without the GEP marked inbounds) constructing a pointer to a random object with a GEP. ( http://llvm.org/docs/LangRef.html#pointeraliasing). This permission is used in InstCombineLoadStoreAlloca.cpp canSimplifyNullLoadOrGEP to eliminate the load based on a null pointer GEP, despite the non-zero offset to a not-inbounds GEP. To the first, asking people who do crazy stuff like this to compile with -fno-strict-overflow is reasonable...they're invoking C-level undefined behavior by doing arbitrary pointer-math, so they should use the flag that tells the compiler to make arbitrary pointer-math not be UB. And to the second, it seems to me that it would be fairly reasonable change to make the latter optimization dependent on the GEP being marked inbounds. (Even if the alias analysis still doesn't promise you anything here.) On Thu, Jul 6, 2017 at 3:17 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote:> > glibc does accept patches...or are you talking about two separate > instances of this problem, both in glibc and something else? > > > > I originally saw this in a benchmark (which it may be possible to get > changed) and only afterward found the glibc idiom. > > > > The most recent glibc code is a bit more complicated than I represented > below. If you look up obstack.h you can see what’s there now. Basically, > they’re computing a pointer alignment that may be based on some non-null > base pointer or may be based on null, depending on the target > architecture. I’m sure it’s possible to make it more sensible, but I’ve > never interacted with the glibc community so I’m not sure how open they > would be to changing this. I suspect I’d need a more compelling argument > than clang-compatibility (or maybe even standards compatibility). > > > > *From:* James Y Knight [mailto:jyknight at google.com] > *Sent:* Thursday, July 06, 2017 12:06 PM > *To:* Kaylor, Andrew <andrew.kaylor at intel.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] GEP with a null pointer base > > > > > > > > On Thu, Jul 6, 2017 at 2:06 PM, Kaylor, Andrew via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > I've got a problem that I would like some input on. The problem basically > boils down to a program that I am compiling, whose source I don't control, > doing something like this: > > > > p = (char*)0 + n > > > > where 'n' is an intptr_t-sized value that the program knows is actually a > valid address for a pointer. > > > > clang translates this as > > > > %p = getelementptr inbounds i8, i8* null, i64 %n > > > > So far, so good. The problem is that while LLVM seems to consider the > above IR to be valid, we officially do not allow dereferencing a pointer > constructed in this way (if I’m reading the rules correctly). > Consequently, if this GEP ever gets close enough to a load using the > pointer, InstCombine will eliminate the GEP and the load. > > > > I've been told that the '(char*)0 + n' construct is invalid according to > the C standard. However, this pattern appears in the glibc malloc > implementation, so I'd like to be able to handle it anyway. > > > > glibc does accept patches...or are you talking about two separate > instances of this problem, both in glibc and something else? > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/62e8b22f/attachment.html>
* James Y. Knight via llvm-dev:> To the first, asking people who do crazy stuff like this to compile with > -fno-strict-overflow is reasonable...they're invoking C-level undefined > behavior by doing arbitrary pointer-math, so they should use the flag that > tells the compiler to make arbitrary pointer-math not be UB.Bit this comes from a glibc header, so ideally, it should be fixed to work with all compiler modes (with the caveat that some things that obstack does may never be valid C code).
Responded to this earlier but didn't get llvm-dev on the list: This is the commit that caused the newest revision of the problem: https://github.com/lattera/glibc/commit/2fd4de4b15a66f821057af90714145d2c034a609 Note that it removes the OLDER version of the problem (__PTR_TO_INT/__INT_TO_PTR). The interesting part of the old version is the comment: -/* We use subtraction of (char *) 0 instead of casting to int - because on word-addressable machines a simple cast to int - may ignore the byte-within-word field of the pointer. */ I'll note that I tried both the new and old version, and think Clang's emit for this pattern (GEP(i8,nullptr, %SOMETHING)) will correct this issue. -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Florian Weimer via llvm-dev Sent: Thursday, July 6, 2017 1:06 PM To: Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] GEP with a null pointer base * Andrew via llvm-dev Kaylor:> The most recent glibc code is a bit more complicated than I > represented below. If you look up obstack.h you can see what’s there > now. Basically, they’re computing a pointer alignment that may be > based on some non-null base pointer or may be based on null, depending > on the target architecture. I’m sure it’s possible to make it more > sensible, but I’ve never interacted with the glibc community so I’m > not sure how open they would be to changing this. I suspect I’d need > a more compelling argument than clang-compatibility (or maybe even > standards compatibility).Not at all, we are fine with making installed headers more palatable to a wider range of compilers. That being said, I find obstacks rather horrible and wouldn't be surprised if they can't be implemented in a standard-conforming manner. Strictly speaking, you can't take a pointer, convert it to (u)intptr_t, twiddle some bits for alignment purposes, and cast it back to get another object pointer. But it seems to me that obstacks go beyond that in terms of C language abuse. The offset relative to NULL was probably needed for some old compilers. We should rewrite this using uintptr_t and __alignof__. If anyone wants to work on this and change the macros, we'd definitely review a patch submission (if it is short enough or is backed by a copyright assignment, sorry). _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev