I’m not entirely opposed to solution #3. As I said, my concern is that there are cases it would miss. For instance, if I had some code like this: char *get_ptr(char *base, intptr_t offset) { return base + offset; } char *convert_to_ptr(intptr_t ptr_val) { return get_ptr((char*)0, ptr_val); } There the idiom would only appear after inlining, so the front end couldn’t handle it. The current glibc code is implemented with a couple of layers of macros that have a logical branch that could theoretically result in the null coming in via a PHI, but some early investigation makes it look like the choice between null and something else is actually resolved in the front end in some way. If that holds up and it turns out that I don’t have any actual programs where the front end can’t spot this idiom (and I agree it’s horrible) maybe it would be acceptable to not handle the theoretical cases. -Andy From: Chris Lattner [mailto:clattner at nondot.org] Sent: Thursday, July 06, 2017 11:53 AM To: Kaylor, Andrew <andrew.kaylor at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] GEP with a null pointer base On Jul 6, 2017, at 11:06 AM, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: p = (char*)0 + n 3) Have the front end recognize this particular idiom and translate it directly as inttoptr. We like the first solution best. The second "solution" is basically a punt. It does away with the immediate problem but leaves the code basically working by chance. I think the third solution is incomplete, because it relies on the front end being able to detect the use of a null pointer whereas that might not emerge until a few basic optimizations have been performed. I was hoping to get some more input on this matter before proceeding. Personally, I’d prefer #3 for two reasons: - This is a very C specific weirdness, so putting it into the frontend makes sense. - This is really about supporting a specific (horrible :-) idiom. It makes sense to recognize this in the frontend, which is close to the idiom truth, rather than in the optimizer, which is run multiple times and sees code after being transformed. I see this as pretty similar to the analogous hacks we do to support broken offsetof idioms. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/adec292d/attachment.html>
> On Jul 6, 2017, at 12:28 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote: > > I’m not entirely opposed to solution #3. As I said, my concern is that there are cases it would miss. > > For instance, if I had some code like this: > > char *get_ptr(char *base, intptr_t offset) { > return base + offset; > } > > char *convert_to_ptr(intptr_t ptr_val) { > return get_ptr((char*)0, ptr_val); > } > > There the idiom would only appear after inlining, so the front end couldn’t handle it.Agreed, but that is a good thing, not a bad thing. The whole point here is that we want to limit the impact to only occur for the cases that matter. This would be a source compatibility hack, not a general extension to the IR.> The current glibc code is implemented with a couple of layers of macros that have a logical branch that could theoretically result in the null coming in via a PHI, but some early investigation makes it look like the choice between null and something else is actually resolved in the front end in some way. If that holds up and it turns out that I don’t have any actual programs where the front end can’t spot this idiom (and I agree it’s horrible) maybe it would be acceptable to not handle the theoretical cases.Macros shouldn’t pose a problem. Clang has several recursive AST walkers that could help you. For example, it has a “is a null pointer expression” predicate, which will walk through casts and other nonsense that may be in the way of detecting the zero. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/29422ac0/attachment.html>
That works for me. The front end solution was actually the first one I tried out, so I can confirm that it fixes the problem for the benchmark in question (which Chad Rosier correctly identified). It feels like a hack, but it sounds like maybe you’d prefer that it feel like a hack (which makes sense since it is, in fact, a hack no matter where we put it). From: Chris Lattner [mailto:clattner at nondot.org] Sent: Thursday, July 06, 2017 12:55 PM To: Kaylor, Andrew <andrew.kaylor at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] GEP with a null pointer base On Jul 6, 2017, at 12:28 PM, Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote: I’m not entirely opposed to solution #3. As I said, my concern is that there are cases it would miss. For instance, if I had some code like this: char *get_ptr(char *base, intptr_t offset) { return base + offset; } char *convert_to_ptr(intptr_t ptr_val) { return get_ptr((char*)0, ptr_val); } There the idiom would only appear after inlining, so the front end couldn’t handle it. Agreed, but that is a good thing, not a bad thing. The whole point here is that we want to limit the impact to only occur for the cases that matter. This would be a source compatibility hack, not a general extension to the IR. The current glibc code is implemented with a couple of layers of macros that have a logical branch that could theoretically result in the null coming in via a PHI, but some early investigation makes it look like the choice between null and something else is actually resolved in the front end in some way. If that holds up and it turns out that I don’t have any actual programs where the front end can’t spot this idiom (and I agree it’s horrible) maybe it would be acceptable to not handle the theoretical cases. Macros shouldn’t pose a problem. Clang has several recursive AST walkers that could help you. For example, it has a “is a null pointer expression” predicate, which will walk through casts and other nonsense that may be in the way of detecting the zero. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/d2cfd8ec/attachment-0001.html>