I’m not entirely opposed to solution #3. As I said, my concern is that there
are cases it would miss.
For instance, if I had some code like this:
char *get_ptr(char *base, intptr_t offset) {
return base + offset;
}
char *convert_to_ptr(intptr_t ptr_val) {
return get_ptr((char*)0, ptr_val);
}
There the idiom would only appear after inlining, so the front end couldn’t
handle it. The current glibc code is implemented with a couple of layers of
macros that have a logical branch that could theoretically result in the null
coming in via a PHI, but some early investigation makes it look like the choice
between null and something else is actually resolved in the front end in some
way. If that holds up and it turns out that I don’t have any actual programs
where the front end can’t spot this idiom (and I agree it’s horrible) maybe it
would be acceptable to not handle the theoretical cases.
-Andy
From: Chris Lattner [mailto:clattner at nondot.org]
Sent: Thursday, July 06, 2017 11:53 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] GEP with a null pointer base
On Jul 6, 2017, at 11:06 AM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I've got a problem that I would like some input on. The problem basically
boils down to a program that I am compiling, whose source I don't control,
doing something like this:
p = (char*)0 + n
3) Have the front end recognize this particular idiom and translate it directly
as inttoptr.
We like the first solution best. The second "solution" is basically a
punt. It does away with the immediate problem but leaves the code basically
working by chance. I think the third solution is incomplete, because it relies
on the front end being able to detect the use of a null pointer whereas that
might not emerge until a few basic optimizations have been performed.
I was hoping to get some more input on this matter before proceeding.
Personally, I’d prefer #3 for two reasons:
- This is a very C specific weirdness, so putting it into the frontend makes
sense.
- This is really about supporting a specific (horrible :-) idiom. It makes
sense to recognize this in the frontend, which is close to the idiom truth,
rather than in the optimizer, which is run multiple times and sees code after
being transformed.
I see this as pretty similar to the analogous hacks we do to support broken
offsetof idioms.
-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/adec292d/attachment.html>
> On Jul 6, 2017, at 12:28 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote: > > I’m not entirely opposed to solution #3. As I said, my concern is that there are cases it would miss. > > For instance, if I had some code like this: > > char *get_ptr(char *base, intptr_t offset) { > return base + offset; > } > > char *convert_to_ptr(intptr_t ptr_val) { > return get_ptr((char*)0, ptr_val); > } > > There the idiom would only appear after inlining, so the front end couldn’t handle it.Agreed, but that is a good thing, not a bad thing. The whole point here is that we want to limit the impact to only occur for the cases that matter. This would be a source compatibility hack, not a general extension to the IR.> The current glibc code is implemented with a couple of layers of macros that have a logical branch that could theoretically result in the null coming in via a PHI, but some early investigation makes it look like the choice between null and something else is actually resolved in the front end in some way. If that holds up and it turns out that I don’t have any actual programs where the front end can’t spot this idiom (and I agree it’s horrible) maybe it would be acceptable to not handle the theoretical cases.Macros shouldn’t pose a problem. Clang has several recursive AST walkers that could help you. For example, it has a “is a null pointer expression” predicate, which will walk through casts and other nonsense that may be in the way of detecting the zero. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/29422ac0/attachment.html>
That works for me. The front end solution was actually the first one I tried
out, so I can confirm that it fixes the problem for the benchmark in question
(which Chad Rosier correctly identified). It feels like a hack, but it sounds
like maybe you’d prefer that it feel like a hack (which makes sense since it is,
in fact, a hack no matter where we put it).
From: Chris Lattner [mailto:clattner at nondot.org]
Sent: Thursday, July 06, 2017 12:55 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] GEP with a null pointer base
On Jul 6, 2017, at 12:28 PM, Kaylor, Andrew <andrew.kaylor at
intel.com<mailto:andrew.kaylor at intel.com>> wrote:
I’m not entirely opposed to solution #3. As I said, my concern is that there
are cases it would miss.
For instance, if I had some code like this:
char *get_ptr(char *base, intptr_t offset) {
return base + offset;
}
char *convert_to_ptr(intptr_t ptr_val) {
return get_ptr((char*)0, ptr_val);
}
There the idiom would only appear after inlining, so the front end couldn’t
handle it.
Agreed, but that is a good thing, not a bad thing. The whole point here is that
we want to limit the impact to only occur for the cases that matter. This would
be a source compatibility hack, not a general extension to the IR.
The current glibc code is implemented with a couple of layers of macros that
have a logical branch that could theoretically result in the null coming in via
a PHI, but some early investigation makes it look like the choice between null
and something else is actually resolved in the front end in some way. If that
holds up and it turns out that I don’t have any actual programs where the front
end can’t spot this idiom (and I agree it’s horrible) maybe it would be
acceptable to not handle the theoretical cases.
Macros shouldn’t pose a problem. Clang has several recursive AST walkers that
could help you. For example, it has a “is a null pointer expression” predicate,
which will walk through casts and other nonsense that may be in the way of
detecting the zero.
-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/d2cfd8ec/attachment-0001.html>