thr3ads.net - llvm dev - [LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module) [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Nick Lewycky

2014-Feb-11 01:25 UTC

[LLVMdev] make DataLayout a mandatory part of Module

On 5 February 2014 09:45, Philip Reames <listmail at philipreames.com>
wrote:
>  On 1/31/14 5:23 PM, Nick Lewycky wrote:
>
> On 30 January 2014 09:55, Philip Reames <listmail at
philipreames.com> wrote:
>
>> On 1/29/14 3:40 PM, Nick Lewycky wrote:
>>
>>> The LLVM Module has an optional target triple and target
datalayout.
>>> Without them, an llvm::DataLayout can't be constructed with
meaningful
>>> data. The benefit to making them optional is to permit optimization
that
>>> would work across all possible DataLayouts, then allow us to commit
to a
>>> particular one at a later point in time, thereby performing more
>>> optimization in advance.
>>>
>>> This feature is not being used. Instead, every user of LLVM IR in a
>>> portability system defines one or more standardized datalayouts for
their
>>> platform, and shims to place calls with the outside world. The
primary
>>> reason for this is that independence from DataLayout is not
sufficient to
>>> achieve portability because it doesn't also represent ABI
lowering
>>> constraints. If you have a system that attempts to use LLVM IR in a
>>> portable fashion and does it without standardizing on a datalayout,
please
>>> share your experience.
>>>
>>  Nick, I don't have a current system in place, but I do want to put
>> forward an alternate perspective.
>>
>> We've been looking at doing late insertion of safepoints for
garbage
>> collection.  One of the properties that we end up needing to preserve
>> through all the optimizations which precede our custom rewriting phase
is
>> that the optimizer has not chosen to "hide" pointers from us
by using
>> ptrtoint and integer math tricks. Currently, we're simply running a
>> verification pass before our rewrite, but I'm very interested long
term in
>> constructing ways to ensure a "gc safe" set of optimization
passes.
>>
>
>  As a general rule passes need to support the whole of what the IR can
> support. Trying to operate on a subset of IR seems like a losing battle,
> unless you can show a mapping from one to the other (ie., using code
> duplication to remove all unnatural loops from IR, or collapsing a function
> to having a single exit node).
>
>  What language were you planning to do this for? Does the language permit
> the user to convert pointers to integers and vice versa? If so, what do you
> do if the user program writes a pointer out to a file, reads it back in
> later, and uses it?
>
> Java - which does not permit arbitrary pointer manipulation.  (Well,
> without resorting to mechanism like JNI and sun.misc.Unsafe.  Doing so
> would be explicitly undefined behavior though.)  We also use raw pointer
> manipulations in our implementation (which is eventually inlined), but this
> happens after the safepoint insertion rewrite.
>
> We strictly control the input IR.  As a result, I can insure that the
> initial IR meets our subset requirements.  In practice, all of the opto
> passes appear to preserve these invariants (i.e. not introducing inttoptr),
> but we'd like to justify that a bit more.
>
>
>  One of the ways I've been thinking about - but haven't actually
>> implemented yet - is to deny the optimization passes information about
>> pointer sizing.
>
>
>  Right, pointer size (address space size) will become known to all parts
> of the compiler. It's not even going to be just the optimizations,
> ConstantExpr::get is going to grow smarter because of this, as
> lib/Analysis/ConstantFolding.cpp merges into lib/IR/ConstantFold.cpp. That
> is one of the major benefits that's driving this. (All parts of the
> compiler will also know endian-ness, which means we can constant fold
> loads, too.)
>
> I would argue that all of the pieces you mentioned are performing
> optimizations.  :)  However, the exact semantics are unimportant for the
> overall discussion.
>
>
>  Under the assumption that an opto pass can't insert an ptrtoint cast
>> without knowing a safe integer size to use, this seems like it would
outlaw
>> a class of optimizations we'd be broken by.
>>
>
>  Optimization passes generally prefer converting ptrtoint and inttoptr to
> GEPs whenever possible.
>
> This is good to hear and helps us.
>
>   I expect that we'll end up with *fewer* ptr<->int conversions
with this
> change, because we'll know enough about the target to convert them into
> GEPs.
>
> Er, I'm confused by this.  Why would not knowing the size of a pointer
> case a GEP to be converted to a ptr <-> int conversion?
>
Having target data means we can convert inttoptr/ptrtoint into GEPs,
particularly in constant expression folding.

Or do you mean that after the change conversions in the original input
IR> are more likely to be recognized?
>
>
>  My understanding is that the only current way to do this would be to not
>> specify a DataLayout.  (And hack a few places with built in
assumptions.
>>  Let's ignore that for the moment.)  With your proposed change,
would there
>> be a clean way to express something like this?
>>
>
>  I think your GC placement algorithm needs to handle inttoptr and
> ptrtoint, whichever way this discussion goes. Sorry. I'd be happy to
hear
> others chime in -- I know I'm not an expert in this area or about GCs
--
> but I don't find this rationale compelling.
>
> The key assumption I didn't initially explain is that the initial IR
> couldn't contain conversions.  With that added, do you still see
concerns?
> I'm fairly sure I don't need to handle general ptr <-> int
conversions.  If
> I'm wrong, I'd really like to know it.
>
So we met at the social and talked about this at length. I'll repeat most
of the conversation so that it's on the mailing list, and also I've had
some additional thoughts since then.

You're using the llvm type system to detect when something is a pointer,
and then you rely on knowing what's a pointer to deduce garbage collection
roots. We're supposed to have the llvm.gcroots intrinsic for this purpose,
but you note that it prevents gc roots from being in registers (they must
be in memory somewhere, usually on the stack), and that fixing it is more
work than is reasonable.

Your IR won't do any shifty pointer-int conversion shenanigans, and you
want some assurance that an optimization won't introduce them, or that if
one does then you can call it out as a bug and get it fixed. I think that's
reasonable, but I also think it's something we need to put forth before
llvm-dev.

Note that pointer-to-int conversions aren't necessarily just the
ptrtoint/inttoptr instructions (and constant expressions), there's also
casting between { i64 }* and { i8* }* and such. Are there legitimate
reasons an optz'n would introduce a cast? I think that anywhere in the
mid-optimizer, conflating integers and pointers is only going to be bad for
both the integer optimizations and the pointer optimizations.

It may make sense as part of lowering -- suppose we find two alloca's, one
i64 and one i8* and find that their lifetimes are distinct, and i64 and i8*
are the same size, so we merge them. Because of how this would interfere, I
don't think this belongs anywhere in the mid-optimizer, it would have to
happen late, after lowering. That suggests that there's a point in the pass
pipeline where the IR is "canonical enough" that this will actually
work.

Is that reasonable? Can we actually guarantee that, that any pass which
would break this goes after a common gc-root insertion spot? Do we need
(want?) to push back and say "no, sorry, make GC roots better
instead"?

Nick
>
>  p.s. From reading the mailing list a while back, I suspect that the SPIR
>> folks might have similar needs.  (i.e. hiding pointer sizes, etc..) 
Pure
>> speculation on my part though.
>>
>
>  The SPIR spec specifies two target datalayouts, one for 32 bits and one
> for 64 bits.
>
> Good to know.  Thanks.
>
>
>  Nick
>
>    Philip
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140210/dc636ac2/attachment.html>

David Chisnall

2014-Feb-11 11:27 UTC

head link

[LLVMdev] make DataLayout a mandatory part of Module

On 11 Feb 2014, at 01:25, Nick Lewycky <nlewycky at google.com> wrote:
> Your IR won't do any shifty pointer-int conversion shenanigans, and you
want some assurance that an optimization won't introduce them, or that if
one does then you can call it out as a bug and get it fixed. I think that's
reasonable, but I also think it's something we need to put forth before
llvm-dev.
> 
> Note that pointer-to-int conversions aren't necessarily just the
ptrtoint/inttoptr instructions (and constant expressions), there's also
casting between { i64 }* and { i8* }* and such. Are there legitimate reasons an
optz'n would introduce a cast? I think that anywhere in the mid-optimizer,
conflating integers and pointers is only going to be bad for both the integer
optimizations and the pointer optimizations.
> 
> It may make sense as part of lowering -- suppose we find two alloca's,
one i64 and one i8* and find that their lifetimes are distinct, and i64 and i8*
are the same size, so we merge them. Because of how this would interfere, I
don't think this belongs anywhere in the mid-optimizer, it would have to
happen late, after lowering. That suggests that there's a point in the pass
pipeline where the IR is "canonical enough" that this will actually
work.
> 
> Is that reasonable? Can we actually guarantee that, that any pass which
would break this goes after a common gc-root insertion spot? Do we need (want?)
to push back and say "no, sorry, make GC roots better instead"?
I am not currently working on GC, but I am working on a back end for an
architecture in which pointer integrity is enforced in hardware and pointers and
integers have different representations and different representations and so I
would also find much of this contract for optimisations useful.  Round tripping
via an int involves data loss on my architecture and having optimisations insert
these can be annoying (and break security properties).  I imagine that the
situation is similar for most software-enforced memory safety tools, not just
GC.

David

Chandler Carruth

2014-Feb-11 11:37 UTC

head link

[LLVMdev] make DataLayout a mandatory part of Module

On Tue, Feb 11, 2014 at 3:27 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
> On 11 Feb 2014, at 01:25, Nick Lewycky <nlewycky at google.com>
wrote:
>
> > Your IR won't do any shifty pointer-int conversion shenanigans,
and you
> want some assurance that an optimization won't introduce them, or that
if
> one does then you can call it out as a bug and get it fixed. I think
that's
> reasonable, but I also think it's something we need to put forth before
> llvm-dev.
> >
> > Note that pointer-to-int conversions aren't necessarily just the
> ptrtoint/inttoptr instructions (and constant expressions), there's also
> casting between { i64 }* and { i8* }* and such. Are there legitimate
> reasons an optz'n would introduce a cast? I think that anywhere in the
> mid-optimizer, conflating integers and pointers is only going to be bad for
> both the integer optimizations and the pointer optimizations.
> >
> > It may make sense as part of lowering -- suppose we find two
alloca's,
> one i64 and one i8* and find that their lifetimes are distinct, and i64 and
> i8* are the same size, so we merge them. Because of how this would
> interfere, I don't think this belongs anywhere in the mid-optimizer, it
> would have to happen late, after lowering. That suggests that there's a
> point in the pass pipeline where the IR is "canonical enough"
that this
> will actually work.
> >
> > Is that reasonable? Can we actually guarantee that, that any pass
which
> would break this goes after a common gc-root insertion spot? Do we need
> (want?) to push back and say "no, sorry, make GC roots better
instead"?
>
> I am not currently working on GC, but I am working on a back end for an
> architecture in which pointer integrity is enforced in hardware and
> pointers and integers have different representations and different
> representations and so I would also find much of this contract for
> optimisations useful.  Round tripping via an int involves data loss on my
> architecture and having optimisations insert these can be annoying (and
> break security properties).  I imagine that the situation is similar for
> most software-enforced memory safety tools, not just GC.

While I find all of these things very interesting from the perspective of
security and/or hardware constraints, I don't think we should try to deal
with that here.

Today, even without a datalayout, I suspect LLVM is not providing nearly
the guarantee that either of these use cases is looking for. It may well
work by happenstance, but hope isn't a strategy. If we want to add this
constraint to LLVM, let's discuss that separately. I don't think we have
it
today, and I don't think making datalayout mandatory meaningfully moves us
further from having it. At worst it causes already possible random failures
to become more common.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140211/23ff1288/attachment.html>

Philip Reames

2014-Feb-14 18:59 UTC

head link

[LLVMdev] make DataLayout a mandatory part of Module

Nick,

Thanks for writing up the summary of our conversation.  I have a couple 
of small clarifications to make, but I'm going to move that into a 
separate thread since the discussion has largely devolved from the 
original topic.

To repeat my comment from last week, I support your proposed change 
w.r.t. DataLayout.

Philip

On 02/10/2014 05:25 PM, Nick Lewycky wrote:> On 5 February 2014 09:45, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>     On 1/31/14 5:23 PM, Nick Lewycky wrote:
>>     On 30 January 2014 09:55, Philip Reames
>>     <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>>
>>         On 1/29/14 3:40 PM, Nick Lewycky wrote:
>>
>>             The LLVM Module has an optional target triple and target
>>             datalayout. Without them, an llvm::DataLayout can't be
>>             constructed with meaningful data. The benefit to making
>>             them optional is to permit optimization that would work
>>             across all possible DataLayouts, then allow us to commit
>>             to a particular one at a later point in time, thereby
>>             performing more optimization in advance.
>>
>>             This feature is not being used. Instead, every user of
>>             LLVM IR in a portability system defines one or more
>>             standardized datalayouts for their platform, and shims to
>>             place calls with the outside world. The primary reason
>>             for this is that independence from DataLayout is not
>>             sufficient to achieve portability because it doesn't
also
>>             represent ABI lowering constraints. If you have a system
>>             that attempts to use LLVM IR in a portable fashion and
>>             does it without standardizing on a datalayout, please
>>             share your experience.
>>
>>         Nick, I don't have a current system in place, but I do want
>>         to put forward an alternate perspective.
>>
>>         We've been looking at doing late insertion of safepoints
for
>>         garbage collection.  One of the properties that we end up
>>         needing to preserve through all the optimizations which
>>         precede our custom rewriting phase is that the optimizer has
>>         not chosen to "hide" pointers from us by using
ptrtoint and
>>         integer math tricks. Currently, we're simply running a
>>         verification pass before our rewrite, but I'm very
interested
>>         long term in constructing ways to ensure a "gc safe"
set of
>>         optimization passes.
>>
>>
>>     As a general rule passes need to support the whole of what the IR
>>     can support. Trying to operate on a subset of IR seems like a
>>     losing battle, unless you can show a mapping from one to the
>>     other (ie., using code duplication to remove all unnatural loops
>>     from IR, or collapsing a function to having a single exit node).
>>
>>     What language were you planning to do this for? Does the language
>>     permit the user to convert pointers to integers and vice versa?
>>     If so, what do you do if the user program writes a pointer out to
>>     a file, reads it back in later, and uses it?
>     Java - which does not permit arbitrary pointer manipulation. 
>     (Well, without resorting to mechanism like JNI and
>     sun.misc.Unsafe.  Doing so would be explicitly undefined behavior
>     though.)  We also use raw pointer manipulations in our
>     implementation (which is eventually inlined), but this happens
>     after the safepoint insertion rewrite.
>
>     We strictly control the input IR.  As a result, I can insure that
>     the initial IR meets our subset requirements.  In practice, all of
>     the opto passes appear to preserve these invariants (i.e. not
>     introducing inttoptr), but we'd like to justify that a bit more.
>>
>>         One of the ways I've been thinking about - but haven't
>>         actually implemented yet - is to deny the optimization passes
>>         information about pointer sizing.
>>
>>
>>     Right, pointer size (address space size) will become known to all
>>     parts of the compiler. It's not even going to be just the
>>     optimizations, ConstantExpr::get is going to grow smarter because
>>     of this, as lib/Analysis/ConstantFolding.cpp merges into
>>     lib/IR/ConstantFold.cpp. That is one of the major benefits
that's
>>     driving this. (All parts of the compiler will also know
>>     endian-ness, which means we can constant fold loads, too.)
>     I would argue that all of the pieces you mentioned are performing
>     optimizations.  :)  However, the exact semantics are unimportant
>     for the overall discussion.
>>
>>         Under the assumption that an opto pass can't insert an
>>         ptrtoint cast without knowing a safe integer size to use,
>>         this seems like it would outlaw a class of optimizations
we'd
>>         be broken by.
>>
>>
>>     Optimization passes generally prefer converting ptrtoint and
>>     inttoptr to GEPs whenever possible.
>     This is good to hear and helps us.
>
>>     I expect that we'll end up with *fewer* ptr<->int
conversions
>>     with this change, because we'll know enough about the target to
>>     convert them into GEPs.
>     Er, I'm confused by this.  Why would not knowing the size of a
>     pointer case a GEP to be converted to a ptr <-> int conversion?
>
>
> Having target data means we can convert inttoptr/ptrtoint into GEPs, 
> particularly in constant expression folding.
>
>     Or do you mean that after the change conversions in the original
>     input IR are more likely to be recognized?
>
>>
>>         My understanding is that the only current way to do this
>>         would be to not specify a DataLayout.  (And hack a few places
>>         with built in assumptions.  Let's ignore that for the
>>         moment.)  With your proposed change, would there be a clean
>>         way to express something like this?
>>
>>
>>     I think your GC placement algorithm needs to handle inttoptr and
>>     ptrtoint, whichever way this discussion goes. Sorry. I'd be
happy
>>     to hear others chime in -- I know I'm not an expert in this
area
>>     or about GCs -- but I don't find this rationale compelling.
>     The key assumption I didn't initially explain is that the initial
>     IR couldn't contain conversions.  With that added, do you still
>     see concerns?  I'm fairly sure I don't need to handle general
ptr
>     <-> int conversions.  If I'm wrong, I'd really like to
know it.
>
>
> So we met at the social and talked about this at length. I'll repeat 
> most of the conversation so that it's on the mailing list, and also 
> I've had some additional thoughts since then.
>
> You're using the llvm type system to detect when something is a 
> pointer, and then you rely on knowing what's a pointer to deduce 
> garbage collection roots. We're supposed to have the llvm.gcroots 
> intrinsic for this purpose, but you note that it prevents gc roots 
> from being in registers (they must be in memory somewhere, usually on 
> the stack), and that fixing it is more work than is reasonable.
>
> Your IR won't do any shifty pointer-int conversion shenanigans, and 
> you want some assurance that an optimization won't introduce them, or 
> that if one does then you can call it out as a bug and get it fixed. I 
> think that's reasonable, but I also think it's something we need to
> put forth before llvm-dev.
>
> Note that pointer-to-int conversions aren't necessarily just the 
> ptrtoint/inttoptr instructions (and constant expressions), there's 
> also casting between { i64 }* and { i8* }* and such. Are there 
> legitimate reasons an optz'n would introduce a cast? I think that 
> anywhere in the mid-optimizer, conflating integers and pointers is 
> only going to be bad for both the integer optimizations and the 
> pointer optimizations.
>
> It may make sense as part of lowering -- suppose we find two alloca's, 
> one i64 and one i8* and find that their lifetimes are distinct, and 
> i64 and i8* are the same size, so we merge them. Because of how this 
> would interfere, I don't think this belongs anywhere in the 
> mid-optimizer, it would have to happen late, after lowering. That 
> suggests that there's a point in the pass pipeline where the IR is 
> "canonical enough" that this will actually work.
>
> Is that reasonable? Can we actually guarantee that, that any pass 
> which would break this goes after a common gc-root insertion spot? Do 
> we need (want?) to push back and say "no, sorry, make GC roots better 
> instead"?
>
> Nick
>
>>
>>         p.s. From reading the mailing list a while back, I suspect
>>         that the SPIR folks might have similar needs.  (i.e. hiding
>>         pointer sizes, etc..)  Pure speculation on my part though.
>>
>>
>>     The SPIR spec specifies two target datalayouts, one for 32 bits
>>     and one for 64 bits.
>     Good to know.  Thanks.
>>
>>     Nick
>>
>     Philip
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140214/4cc2569c/attachment.html>

Philip Reames

2014-Feb-15 01:55 UTC

head link

[LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

Splitting out a conversation which started in "make DataLayout a 
mandatory part of Module" since the topic has decidedly changed. This 
also relates to the email "RFC: GEP as canonical form for pointer 
addressing" I just sent.

On 02/10/2014 05:25 PM, Nick Lewycky wrote:> On 5 February 2014 09:45, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>     On 1/31/14 5:23 PM, Nick Lewycky wrote:
>>     On 30 January 2014 09:55, Philip Reames
>>     <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>>
>>         On 1/29/14 3:40 PM, Nick Lewycky wrote:
>>
>>             The LLVM Module has an optional target triple and target
>>             datalayout. Without them, an llvm::DataLayout can't be
>>             constructed with meaningful data. The benefit to making
>>             them optional is to permit optimization that would work
>>             across all possible DataLayouts, then allow us to commit
>>             to a particular one at a later point in time, thereby
>>             performing more optimization in advance.
>>
>>             This feature is not being used. Instead, every user of
>>             LLVM IR in a portability system defines one or more
>>             standardized datalayouts for their platform, and shims to
>>             place calls with the outside world. The primary reason
>>             for this is that independence from DataLayout is not
>>             sufficient to achieve portability because it doesn't
also
>>             represent ABI lowering constraints. If you have a system
>>             that attempts to use LLVM IR in a portable fashion and
>>             does it without standardizing on a datalayout, please
>>             share your experience.
>>
>>         Nick, I don't have a current system in place, but I do want
>>         to put forward an alternate perspective.
>>
>>         We've been looking at doing late insertion of safepoints
for
>>         garbage collection.  One of the properties that we end up
>>         needing to preserve through all the optimizations which
>>         precede our custom rewriting phase is that the optimizer has
>>         not chosen to "hide" pointers from us by using
ptrtoint and
>>         integer math tricks. Currently, we're simply running a
>>         verification pass before our rewrite, but I'm very
interested
>>         long term in constructing ways to ensure a "gc safe"
set of
>>         optimization passes.
>>
>>
>>     As a general rule passes need to support the whole of what the IR
>>     can support. Trying to operate on a subset of IR seems like a
>>     losing battle, unless you can show a mapping from one to the
>>     other (ie., using code duplication to remove all unnatural loops
>>     from IR, or collapsing a function to having a single exit node).
>>
>>     What language were you planning to do this for? Does the language
>>     permit the user to convert pointers to integers and vice versa?
>>     If so, what do you do if the user program writes a pointer out to
>>     a file, reads it back in later, and uses it?
>     Java - which does not permit arbitrary pointer manipulation. 
>     (Well, without resorting to mechanism like JNI and
>     sun.misc.Unsafe.  Doing so would be explicitly undefined behavior
>     though.)  We also use raw pointer manipulations in our
>     implementation (which is eventually inlined), but this happens
>     after the safepoint insertion rewrite.
>
>     We strictly control the input IR.  As a result, I can insure that
>     the initial IR meets our subset requirements.  In practice, all of
>     the opto passes appear to preserve these invariants (i.e. not
>     introducing inttoptr), but we'd like to justify that a bit more.
>>
>>         One of the ways I've been thinking about - but haven't
>>         actually implemented yet - is to deny the optimization passes
>>         information about pointer sizing.
>>
>>
>>     Right, pointer size (address space size) will become known to all
>>     parts of the compiler. It's not even going to be just the
>>     optimizations, ConstantExpr::get is going to grow smarter because
>>     of this, as lib/Analysis/ConstantFolding.cpp merges into
>>     lib/IR/ConstantFold.cpp. That is one of the major benefits
that's
>>     driving this. (All parts of the compiler will also know
>>     endian-ness, which means we can constant fold loads, too.)
>     I would argue that all of the pieces you mentioned are performing
>     optimizations.  :)  However, the exact semantics are unimportant
>     for the overall discussion.
>>
>>         Under the assumption that an opto pass can't insert an
>>         ptrtoint cast without knowing a safe integer size to use,
>>         this seems like it would outlaw a class of optimizations
we'd
>>         be broken by.
>>
>>
>>     Optimization passes generally prefer converting ptrtoint and
>>     inttoptr to GEPs whenever possible.
>     This is good to hear and helps us.
>
>>     I expect that we'll end up with *fewer* ptr<->int
conversions
>>     with this change, because we'll know enough about the target to
>>     convert them into GEPs.
>     Er, I'm confused by this.  Why would not knowing the size of a
>     pointer case a GEP to be converted to a ptr <-> int conversion?
>
>
> Having target data means we can convert inttoptr/ptrtoint into GEPs, 
> particularly in constant expression folding.
>
>     Or do you mean that after the change conversions in the original
>     input IR are more likely to be recognized?
>
>>
>>         My understanding is that the only current way to do this
>>         would be to not specify a DataLayout.  (And hack a few places
>>         with built in assumptions.  Let's ignore that for the
>>         moment.)  With your proposed change, would there be a clean
>>         way to express something like this?
>>
>>
>>     I think your GC placement algorithm needs to handle inttoptr and
>>     ptrtoint, whichever way this discussion goes. Sorry. I'd be
happy
>>     to hear others chime in -- I know I'm not an expert in this
area
>>     or about GCs -- but I don't find this rationale compelling.
>     The key assumption I didn't initially explain is that the initial
>     IR couldn't contain conversions.  With that added, do you still
>     see concerns?  I'm fairly sure I don't need to handle general
ptr
>     <-> int conversions.  If I'm wrong, I'd really like to
know it.
>
>
> So we met at the social and talked about this at length. I'll repeat 
> most of the conversation so that it's on the mailing list, and also 
> I've had some additional thoughts since then.
>
> You're using the llvm type system to detect when something is a 
> pointer, and then you rely on knowing what's a pointer to deduce 
> garbage collection roots.
Correct.> We're supposed to have the llvm.gcroots intrinsic for this purpose, 
> but you note that it prevents gc roots from being in registers (they 
> must be in memory somewhere, usually on the stack), and that fixing it 
> is more work than is reasonable.This is slightly off, but probably close to what I actually said even if 
not quite what I meant.  :)

I'm going to skip this and respond with a fuller explanation Monday.  
I'd written an explanation once, realized it was wrong, and decided I 
should probably revisit when fully awake.

Fundamentally, I believe that gc.roots could be made to work, even with 
decent (but not optimal) performance in the end.  We may even contribute 
some patches towards fixing issues with the gc.root mechanism just to 
make a fair comparison.  I just don't believe it's the right approach or
the best way to reach the end goal.>
> Your IR won't do any shifty pointer-int conversion shenanigans, and 
> you want some assurance that an optimization won't introduce them, or 
> that if one does then you can call it out as a bug and get it fixed. I 
> think that's reasonable, but I also think it's something we need to
> put forth before llvm-dev.Correct and agreed.  I split this part off into a separate proposal 
under the subject "RFC: GEP as canonical form for pointer
addressing".>
> Note that pointer-to-int conversions aren't necessarily just the 
> ptrtoint/inttoptr instructions (and constant expressions), there's 
> also casting between { i64 }* and { i8* }* and such. Are there 
> legitimate reasons an optz'n would introduce a cast? I think that 
> anywhere in the mid-optimizer, conflating integers and pointers is 
> only going to be bad for both the integer optimizations and the 
> pointer optimizations.
>
> It may make sense as part of lowering -- suppose we find two alloca's, 
> one i64 and one i8* and find that their lifetimes are distinct, and 
> i64 and i8* are the same size, so we merge them. Because of how this 
> would interfere, I don't think this belongs anywhere in the 
> mid-optimizer, it would have to happen late, after lowering. That 
> suggests that there's a point in the pass pipeline where the IR is 
> "canonical enough" that this will actually work.I agree this is possible, even with my proposal.  In fact, we already 
have a stack colouring pass in tree which does exactly what your example 
illustrates.  However, this is done well after CodeGenPrepare and is 
thus after we start relaxing canonical form anyway.

A couple of other transforms which could potentially be problematic:
- load widening
- vectorization (when the vector element type looses the 'pointerness')

In each of these cases, we have clear ways of expressing the 
transformation in ways which preserve type information.  (i.e. struct 
types, vector element types, etc..)  I would hope we could move towards 
these cleaner representations.  (Note: I haven't checked the current 
implementations.  I should do so.)

My view of this is that any optimization which lost type information in 
such a manner without good cause would be poor style to begin with.  I 
would hope that patches to remove such information loss would be 
accepted so long as there was a reasonable alternative. (I'm assuming 
this is already true; if it's not, let me know.)

(In case it's not clear, being past CodeGenPrepare and lowering for a 
specific target would be a "good reason".)
>
> Is that reasonable? Can we actually guarantee that, that any pass 
> which would break this goes after a common gc-root insertion spot? Do 
> we need (want?) to push back and say "no, sorry, make GC roots better 
> instead"?I think it is, but am open to being convinced otherwise.  :)

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140214/54444c44/attachment.html>

Philip Reames

2014-Feb-18 19:55 UTC

head link

[LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

On 02/14/2014 05:55 PM, Philip Reames wrote:> On 02/10/2014 05:25 PM, Nick Lewycky wrote:
>>
>> Note that pointer-to-int conversions aren't necessarily just the 
>> ptrtoint/inttoptr instructions (and constant expressions), there's 
>> also casting between { i64 }* and { i8* }* and such. Are there 
>> legitimate reasons an optz'n would introduce a cast? I think that 
>> anywhere in the mid-optimizer, conflating integers and pointers is 
>> only going to be bad for both the integer optimizations and the 
>> pointer optimizations.
>>
>> It may make sense as part of lowering -- suppose we find two 
>> alloca's, one i64 and one i8* and find that their lifetimes are 
>> distinct, and i64 and i8* are the same size, so we merge them. 
>> Because of how this would interfere, I don't think this belongs 
>> anywhere in the mid-optimizer, it would have to happen late, after 
>> lowering. That suggests that there's a point in the pass pipeline 
>> where the IR is "canonical enough" that this will actually
work.
> I agree this is possible, even with my proposal.  In fact, we already 
> have a stack colouring pass in tree which does exactly what your 
> example illustrates.  However, this is done well after CodeGenPrepare 
> and is thus after we start relaxing canonical form anyway.
>
> A couple of other transforms which could potentially be problematic:
> - load widening
> - vectorization (when the vector element type looses the
'pointerness')
>
> In each of these cases, we have clear ways of expressing the 
> transformation in ways which preserve type information.  (i.e. struct 
> types, vector element types, etc..)  I would hope we could move 
> towards these cleaner representations.  (Note: I haven't checked the 
> current implementations.  I should do so.)
>
> My view of this is that any optimization which lost type information 
> in such a manner without good cause would be poor style to begin 
> with.  I would hope that patches to remove such information loss would 
> be accepted so long as there was a reasonable alternative.  (I'm 
> assuming this is already true; if it's not, let me know.)
>
> (In case it's not clear, being past CodeGenPrepare and lowering for a 
> specific target would be a "good reason".)One thing I thought of over the weekend: all of the transformations 
discussed above are already illegal unless they explicitly preserve the 
address space of the pointer.

Doesn't prevent them from existing, but it does increase the odds 
they're already buggy.

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140218/98d3bde9/attachment.html>

Philip Reames

2014-Feb-21 18:37 UTC

head link

[LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

On 02/14/2014 05:55 PM, Philip Reames wrote:> Splitting out a conversation which started in "make DataLayout a 
> mandatory part of Module" since the topic has decidedly changed. This 
> also relates to the email "RFC: GEP as canonical form for pointer 
> addressing" I just sent.
>
> On 02/10/2014 05:25 PM, Nick Lewycky wrote:
>> ...
>>
>> We're supposed to have the llvm.gcroots intrinsic for this purpose,
>> but you note that it prevents gc roots from being in registers (they 
>> must be in memory somewhere, usually on the stack), and that fixing 
>> it is more work than is reasonable.
> This is slightly off, but probably close to what I actually said even 
> if not quite what I meant.  :)
>
> I'm going to skip this and respond with a fuller explanation Monday.  
> I'd written an explanation once, realized it was wrong, and decided I 
> should probably revisit when fully awake.
>
> Fundamentally, I believe that gc.roots could be made to work, even 
> with decent (but not optimal) performance in the end.  We may even 
> contribute some patches towards fixing issues with the gc.root 
> mechanism just to make a fair comparison.  I just don't believe
it's
> the right approach or the best way to reach the end goal.So, not quite on Monday, but I did get around to writing up an 
explanation of what's wrong with using gcroot.  It turned out to be much 
longer than I expected, so I turned it into a blog post:
http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/

The very short version: gcroot loses roots (for any GC) due to bad 
interaction with the optimizer, and gcroot doesn't capture all copies of 
a pointer root which fundamentally breaks collectors which relocate 
roots.  The only way I know to make gcroot (in its current form) work 
reliably for all collectors is to insert safepoints very early, which 
has highly negative performance impacts.  There are some (potentially) 
cheaper but ugly hacks available if you don't need to relocate roots.

There's also going to be a follow up post on implementation problems, 
but that's completely separate from the fundamental problems.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/6d4dbd25/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Feb 2014 - [LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

[LLVMdev] make DataLayout a mandatory part of Module

[LLVMdev] make DataLayout a mandatory part of Module

[LLVMdev] make DataLayout a mandatory part of Module

[LLVMdev] make DataLayout a mandatory part of Module

[LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

[LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

[LLVMdev] Pointer vs Integer classification (was Re: make DataLayout a mandatory part of Module)

Possibly Parallel Threads