Hi everyone, I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: p = (char*)0 + n where 'n' is an intptr_t-sized value that the program knows is actually a valid address for a pointer. clang translates this as %p = getelementptr inbounds i8, i8* null, i64 %n So far, so good. The problem is that while LLVM seems to consider the above IR to be valid, we officially do not allow dereferencing a pointer constructed in this way (if I'm reading the rules correctly). Consequently, if this GEP ever gets close enough to a load using the pointer, InstCombine will eliminate the GEP and the load. I've been told that the '(char*)0 + n' construct is invalid according to the C standard. However, this pattern appears in the glibc malloc implementation, so I'd like to be able to handle it anyway. I've discussed this with a few co-workers and we've considered a few possible solutions: 1) Add a new transformation to InstCombine that will replace 'getelementptr i8, i8* null, <ty> %n' with 'inttoptr <ty> %n to i8*' when <ty> has the same size as a pointer for the target architecture. 2) Disable the existing transformation in InstCombine that eliminates the GEP+load. 3) Have the front end recognize this particular idiom and translate it directly as inttoptr. We like the first solution best. The second "solution" is basically a punt. It does away with the immediate problem but leaves the code basically working by chance. I think the third solution is incomplete, because it relies on the front end being able to detect the use of a null pointer whereas that might not emerge until a few basic optimizations have been performed. I was hoping to get some more input on this matter before proceeding. What do you think? Thanks, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/ce8b7f7f/attachment.html>
> On Jul 6, 2017, at 11:06 AM, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: > > p = (char*)0 + n > 3) Have the front end recognize this particular idiom and translate it directly as inttoptr. > > We like the first solution best. The second "solution" is basically a punt. It does away with the immediate problem but leaves the code basically working by chance. I think the third solution is incomplete, because it relies on the front end being able to detect the use of a null pointer whereas that might not emerge until a few basic optimizations have been performed. > > I was hoping to get some more input on this matter before proceeding. >Personally, I’d prefer #3 for two reasons: - This is a very C specific weirdness, so putting it into the frontend makes sense. - This is really about supporting a specific (horrible :-) idiom. It makes sense to recognize this in the frontend, which is close to the idiom truth, rather than in the optimizer, which is run multiple times and sees code after being transformed. I see this as pretty similar to the analogous hacks we do to support broken offsetof idioms. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/bcd9959f/attachment-0001.html>
On Thu, Jul 6, 2017 at 2:06 PM, Kaylor, Andrew via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I've got a problem that I would like some input on. The problem basically > boils down to a program that I am compiling, whose source I don't control, > doing something like this: > > > > p = (char*)0 + n > > > > where 'n' is an intptr_t-sized value that the program knows is actually a > valid address for a pointer. > > > > clang translates this as > > > > %p = getelementptr inbounds i8, i8* null, i64 %n > > > > So far, so good. The problem is that while LLVM seems to consider the > above IR to be valid, we officially do not allow dereferencing a pointer > constructed in this way (if I’m reading the rules correctly). > Consequently, if this GEP ever gets close enough to a load using the > pointer, InstCombine will eliminate the GEP and the load. > > > > I've been told that the '(char*)0 + n' construct is invalid according to > the C standard. However, this pattern appears in the glibc malloc > implementation, so I'd like to be able to handle it anyway. >glibc does accept patches...or are you talking about two separate instances of this problem, both in glibc and something else? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/d44700d6/attachment.html>
> glibc does accept patches...or are you talking about two separate instances of this problem, both in glibc and something else?I originally saw this in a benchmark (which it may be possible to get changed) and only afterward found the glibc idiom. The most recent glibc code is a bit more complicated than I represented below. If you look up obstack.h you can see what’s there now. Basically, they’re computing a pointer alignment that may be based on some non-null base pointer or may be based on null, depending on the target architecture. I’m sure it’s possible to make it more sensible, but I’ve never interacted with the glibc community so I’m not sure how open they would be to changing this. I suspect I’d need a more compelling argument than clang-compatibility (or maybe even standards compatibility). From: James Y Knight [mailto:jyknight at google.com] Sent: Thursday, July 06, 2017 12:06 PM To: Kaylor, Andrew <andrew.kaylor at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] GEP with a null pointer base On Thu, Jul 6, 2017 at 2:06 PM, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: p = (char*)0 + n where 'n' is an intptr_t-sized value that the program knows is actually a valid address for a pointer. clang translates this as %p = getelementptr inbounds i8, i8* null, i64 %n So far, so good. The problem is that while LLVM seems to consider the above IR to be valid, we officially do not allow dereferencing a pointer constructed in this way (if I’m reading the rules correctly). Consequently, if this GEP ever gets close enough to a load using the pointer, InstCombine will eliminate the GEP and the load. I've been told that the '(char*)0 + n' construct is invalid according to the C standard. However, this pattern appears in the glibc malloc implementation, so I'd like to be able to handle it anyway. glibc does accept patches...or are you talking about two separate instances of this problem, both in glibc and something else? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/47241897/attachment.html>
I’m not entirely opposed to solution #3. As I said, my concern is that there are cases it would miss. For instance, if I had some code like this: char *get_ptr(char *base, intptr_t offset) { return base + offset; } char *convert_to_ptr(intptr_t ptr_val) { return get_ptr((char*)0, ptr_val); } There the idiom would only appear after inlining, so the front end couldn’t handle it. The current glibc code is implemented with a couple of layers of macros that have a logical branch that could theoretically result in the null coming in via a PHI, but some early investigation makes it look like the choice between null and something else is actually resolved in the front end in some way. If that holds up and it turns out that I don’t have any actual programs where the front end can’t spot this idiom (and I agree it’s horrible) maybe it would be acceptable to not handle the theoretical cases. -Andy From: Chris Lattner [mailto:clattner at nondot.org] Sent: Thursday, July 06, 2017 11:53 AM To: Kaylor, Andrew <andrew.kaylor at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] GEP with a null pointer base On Jul 6, 2017, at 11:06 AM, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: p = (char*)0 + n 3) Have the front end recognize this particular idiom and translate it directly as inttoptr. We like the first solution best. The second "solution" is basically a punt. It does away with the immediate problem but leaves the code basically working by chance. I think the third solution is incomplete, because it relies on the front end being able to detect the use of a null pointer whereas that might not emerge until a few basic optimizations have been performed. I was hoping to get some more input on this matter before proceeding. Personally, I’d prefer #3 for two reasons: - This is a very C specific weirdness, so putting it into the frontend makes sense. - This is really about supporting a specific (horrible :-) idiom. It makes sense to recognize this in the frontend, which is close to the idiom truth, rather than in the optimizer, which is run multiple times and sees code after being transformed. I see this as pretty similar to the analogous hacks we do to support broken offsetof idioms. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/adec292d/attachment.html>
On 7/6/2017 3:05 PM, James Y Knight via llvm-dev wrote:> > > On Thu, Jul 6, 2017 at 2:06 PM, Kaylor, Andrew via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > I've got a problem that I would like some input on. The problem > basically boils down to a program that I am compiling, whose > source I don't control, doing something like this: > > p = (char*)0 + n > > where 'n' is an intptr_t-sized value that the program knows is > actually a valid address for a pointer. > > clang translates this as > > %p = getelementptr inbounds i8, i8* null, i64 %n > > So far, so good. The problem is that while LLVM seems to consider > the above IR to be valid, we officially do not allow dereferencing > a pointer constructed in this way (if I’m reading the rules > correctly). Consequently, if this GEP ever gets close enough to a > load using the pointer, InstCombine will eliminate the GEP and the > load. > > I've been told that the '(char*)0 + n' construct is invalid > according to the C standard. However, this pattern appears in the > glibc malloc implementation, so I'd like to be able to handle it > anyway. > > > glibc does accept patches...or are you talking about two separate > instances of this problem, both in glibc and something else?FWIW and IIUC this same issue occurs in SPEC2017/gcc (with ref input) and I hear getting patches past the SPEC committee is incredibly difficult...> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/4ce2448a/attachment.html>