thr3ads.net - llvm dev - [llvm-dev] lifetime.start/end [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2021-Jan-24 16:07 UTC

[llvm-dev] lifetime.start/end

Hi Johannes,
> At some point I lost track of what you even want to achieve.
*lol* fair.^^
All I want is a precise and meaningful spec for lifetime.start and lifetime.end.
That's where we entered this rabbit hole. You didn't like
Juneyoung's and Nuno's
proposal for this and instead suggested to replace the entire system by 
something else, and that new system is what we are discussing now (that is my 
understanding, anyway).
> Your initial 
> problem goes away if we only coalesce and fold comparisons between
allocations
> for which the addresses are not observed.
> Let's do that and thereby benefit from all the good things we can get
from
> lifetime markers and coalescing, for alloca/malloc/...
> Let's not go down the brittle road of syntactic limitations. It
doesn't allow us
> to do a lot of things and will come with a price as we continue.
> 
> FWIW, I think some part of the misconception is that two allocas "will
> definitely get different addresses" which you argue "heavily
restricts the
> possible allocation choices compared to the original C program".
> The addresses are different but that is only interesting if you observe
them.
> Take the C code: `{int a,b, c = (&a == &b); use(a,c);}`. I can
happily coalesce
> `a` and `b` after the comparison was folded, or `b` and `c` even if the 
> comparison is still in place. For neither transformation I need
> lifetime markers to do it in IR.
Sure, everything in the semantics is up to what the program actually observes. 
But I was considering the general case, in which it will not be possible to 
prove that some address is not observed. I expect that to be quite common 
(certainly, Rust programs really like to take references to local variables and 
pass them to other functions), and in those cases, not having lifetime.start/end
will severely restrict program transformations -- as you observed, even 
outlining is affected.

This is basically the abstract version of the concrete concern that James
raised.

Kind regards,
Ralf
> A lot of my arguments around alloca/malloc/... is that we can augment the 
> information from the frontend by determining if an address is observed or
not.
> If it is not, we can limit its allocation, coalesce it, etc. Lifetime
markers
> can help us do this but they are not the only way.
> So, basically, we do not need lifetime markers to coalesce
alloca/malloc/... but
> that they can help.

> 
> Let's try to get out of this rabbit hole. If you have a problem with
the
> proposal as outlined in 
> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html , please
let
> me know. I assume dealing with observed addresses different than with 
> non-observede ones (implied by noescape) is the way forward now.
> 
> ~ Johannes
> 
> 
> 
>>
>> Kind regards,
>> Ralf
>>
>>
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Ralf
>>>>
>>>>>
>>>>> Does that make some sense?
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>>
>>>>>> Kind regards,
>>>>>> Ralf
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes
Doerfert
>>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>>>> It is, alloca must have its integral
address decided at its *allocation*
>>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>>
>>>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000
is already allocated, the new
>>>>>>>>>> alloca cannot be placed there unless it
is not observed in the future,
>>>>>>>>>> according to your proposal.
>>>>>>>>>>
>>>>>>>>>> But how do we know from the current
state whether the stack variable is
>>>>>>>>>> going to be observed or not?
>>>>>>>>> With that argument you could allocate all
but 4 bytes, do a malloc(4)
>>>>>>>>> and know the address of the returned
pointer (assuming it is not null).
>>>>>>>>>
>>>>>>>>> What I try to say is, either your scenario
is part of the model and
>>>>>>>>> everything we do is broken as you could
"observe" addresses passively,
>>>>>>>>> *or*, the abstract machine we use for
semantic reasoning doesn't permit
>>>>>>>>> the above reasoning. I really hope for the
latter.
>>>>>>>>>
>>>>>>>> That's a very valid point..!
>>>>>>>>
>>>>>>>> Actually, I have a memory model that addresses
this problem.
>>>>>>>> The gist is that we can interpret each
allocation instruction as *creating
>>>>>>>> 2 blocks* and nondeterministically returning
one of them.
>>>>>>>> The other one is marked as invalid but not
freed immediately. Deallocation
>>>>>>>> frees both blocks.
>>>>>>>> If there is not enough space for having two
blocks, it is treated as
>>>>>>>> out-of-memory.
>>>>>>>>
>>>>>>>> It makes guessing the address of an allocation
according to memory layout
>>>>>>>> invalid UB.
>>>>>>>>
>>>>>>>> p = malloc(4)
>>>>>>>> // If p != null, it is guaranteed there was at
least two 4 byte slots
>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>>>> // Two blocks are allocated at 0x110 and 0x114,
and one of the addresses is
>>>>>>>> nondeterministically returned.
>>>>>>>> // By the nature of nondeterminism, all
executions below should be
>>>>>>>> well-defined regardless of the address of p.
>>>>>>>> *(int*)0x100 = 10; // In an execution where p
is 0x110, this raises UB.
>>>>>>>>
>>>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>
>>>>>>> Nice! Thanks for the link :)
>>>>>>>
>>>>>>>
>>>>>>>> To argue differently: Who is to say there is a
stack, or only one,
>>>>>>>>> or that alloca allocates memory "on
the one stack"? That is not part of
>>>>>>>>> the IR, IMHO. I can write a machine on
which alloca lowers to malloc,
>>>>>>>>> I don't even need the free during stack
unwind but that I could do as
>>>>>>>>> well if I wanted to.
>>>>>>>>
>>>>>>>> This is right, the example was for illustrative
purposes. IR does not
>>>>>>>> enforce alloca to be placed at 'stack'.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Making icmp/ptrtoint yield poison will
still make loop versioning or
>>>>>>>>>> pointer rewriting transformations
unsound because these operations now
>>>>>>>>> can
>>>>>>>>>> create poison (even if pointers are
noundef).
>>>>>>>>> I did not say they yield poison, at least I
did not try to say that.
>>>>>>>>> What are you referring to exactly?
>>>>>>>>>
>>>>>>>> I was referring to this:
>>>>>>>>
>>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions, which
>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>> As with other violation of attributes I
would, on first though, suggest
>>>>>>>>> to produce poison, not UB.
>>>>>>>> But it is more about the (imaginary) attribute,
so maybe I was slightly out
>>>>>>>> of topic.
>>>>>>>>
>>>>>>>>> 2) is fine, I think the suggestion
semantically makes sense perfectly. 1)
>>>>>>>>>> is something I'm concerned about
now.
>>>>>>>>>>
>>>>>>>>>> There are more than pointer foldings,
such as rewriting such expression,
>>>>>>>>>> code motion ptr cmp, introduce ptr cmp,
etc. There is also analysis
>>>>>>>>> relying
>>>>>>>>>> on ptr cmp.
>>>>>>>>>> Defining the correctness of each of
them is something we want to avoid,
>>>>>>>>> and
>>>>>>>>>> maybe that's why we want to define
precise semantics for things.
>>>>>>>>> I don't get the point. My proposal does
not change the semantics of
>>>>>>>>> pointer comparisons, at all. I explicitly
mentioned that in the last
>>>>>>>>> email.
>>>>>>>>>
>>>>>>>> Oh okay, I thought it was a part of the
lifetime proposal, but it seems
>>>>>>>> more like a separate thing.
>>>>>>>> I agree that this requires performance impact.
>>>>>>>> Also investigation of existing transformations
would be needed; Alive2's
>>>>>>>> pointer comparison is doing approximation yet,
but if it becomes fully
>>>>>>>> precise, it will show something from running
LLVM unit tests I
>>>>>>>> believe..! :)
>>>>>>>>
>>>>>>>>> I think this will be less aggressive and
may give nice feedback to
>>>>>>>>>> potential projects that are using
lifetime with non-alloca.
>>>>>>>>> The lifetime marker debate, basically 2)
above, is orthogonal to the
>>>>>>>>> problem
>>>>>>>>> you try to solve. It got mixed in as
lifetime markers were used by
>>>>>>>>> StackColoring
>>>>>>>>> to perform coalescing but that is
coincidental. You can (arguably)
>>>>>>>>> coalesce stack
>>>>>>>>> allocations regardless of lifetime markers
and with 1) such a
>>>>>>>>> transformation
>>>>>>>>> (w/ and w/o lifetime markers) would
actually be sound.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> At the end, we can implement IR writer
that lowers lifetime with
>>>>>>>>> non-alloca
>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>> Yeah, 2) is orthogonal and we can lower it
that way. Unsure if it is
>>>>>>>>> helpful
>>>>>>>>> but we can certainly define it that way in
the LangRef.
>>>>>>>>>
>>>>>>>> Okay, thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>> ~ Johannes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Juneyoung
>>>>>>>>>>
>>>>>>>>>> p.s. The reply was late, sorry. I think
I can spend more time on this
>>>>>>>>> today.
>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes
Doerfert <
>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee
wrote:
>>>>>>>>>>>>>> Stepwisely defining the
semantics of instructions is a desirable
>>>>>>>>>>>> direction
>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>> I'm confused. What in
the proposal would prevent us from defining
>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an "undesirable
>>>>>>>>>>>> way"?
>>>>>>>>>>>>
>>>>>>>>>>>> I meant it would be great if
the output state after executing an
>>>>>>>>>>>> instruction can be described
using its input state.
>>>>>>>>>>>> (that was the meaning of
'stepwise semantics', I should have been more
>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>
>>>>>>>>>>>> For example, the semantics of
'z = add x y' can be defined as follows:
>>>>>>>>>>>> Given an input state s, next
state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>> where s(x) is the value of x in
the previous state, and s[z -> v] is a
>>>>>>>>>>>> state with z updated to v.
>>>>>>>>>>>>
>>>>>>>>>>>> Another example that involves
memory: the semantics of 'i = ptrtoint p'
>>>>>>>>>>> can
>>>>>>>>>>>> be defined as follows:
>>>>>>>>>>>> Given an input state s, next
state s' = s[i -> s(p).obj.address +
>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>> where obj.address is the begin
address of a memory object obj pointed
>>>>>>>>> by
>>>>>>>>>>> p
>>>>>>>>>>>> & offset is p's byte
offset. (Imagine a pointer to the offset of some
>>>>>>>>>>> char
>>>>>>>>>>>> array).
>>>>>>>>>>>> Note that ptrtoint & add
can be nicely defined w.r.t the input state.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Now, the instruction that
we're facing is 'p = alloca'.
>>>>>>>>>>>> To describe the output state
after executing 'p = alloca', the address
>>>>>>>>> of
>>>>>>>>>>>> new alloca should be
determined.
>>>>>>>>>>>> If observedness is involved, we
need to know the future state again. :/
>>>>>>>>>>> We
>>>>>>>>>>>> don't know whether the
alloca is going to be observed or not without
>>>>>>>>>>> seeing
>>>>>>>>>>>> the future.
>>>>>>>>>>>> This is the problem of the
current lifetime intrinsics as well.
>>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>>
>>>>>>>>>>> Nobody proposed to modify the
semantics of `alloca`.
>>>>>>>>>>> `alloca` provides you with a fresh,
unobserved block of
>>>>>>>>>>> dereferenceable memory that is
implicitly freed as the stack
>>>>>>>>>>> unwinds. That is it. No context
necessary.
>>>>>>>>>>>
>>>>>>>>>>> If you want to modify the IR, you
need to argue the observable
>>>>>>>>>>> semantics which is nothing new.
That this might require more than
>>>>>>>>>>> a peephole view of the program is
also not new.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> One possible approach to
resolve this is adding an 'unobserved' flag to
>>>>>>>>>>>> alloca instruction (similar to
what was suggested by Nicolai).
>>>>>>>>>>>> And, we can say that if alloca
with 'unobserved' is used by
>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>> it is UB.
>>>>>>>>>>> The flag can be added, like we add
other attributes. It should not
>>>>>>>>>>> be required for any optimization we
talked about though. It basically
>>>>>>>>>>> is a way to manifest derived or
given information into the IR.
>>>>>>>>>>>
>>>>>>>>>>> Attribute deduction, as well as
frontends with domain knowledge,
>>>>>>>>>>> can add such information. The flag
we discussed in phab was not even
>>>>>>>>>>> sufficient for all the
transformation examples I presented in my mail,
>>>>>>>>>>> that is why I extended my 
argument. We could still have a "noescape"
>>>>>>>>>>> flag for allocas, but I'm not
sure how useful that really is. We can
>>>>>>>>>>> certainly deduce it and manifest
it, unsure if we have domain knowledge
>>>>>>>>>>> we can use for non-trivial cases
though.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> But this makes ptrtoint/icmp
make UB-raising instructions, which
>>>>>>>>>>>> contradicts with what LLVM
does.
>>>>>>>>>>> As with other violation of
attributes I would, on first though, suggest
>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also, existing optimizations
like loop versioning can introduce
>>>>>>>>>>>> ptrtoint/pointer comparisons
too.
>>>>>>>>>>> Sure. I am not certain why that is
a problem. I get the feeling things
>>>>>>>>>>> are still mixed up here.
>>>>>>>>>>>
>>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>>
>>>>>>>>>>> 1) We stop folding comparisons
between different allocas if changing the
>>>>>>>>>>> address
>>>>>>>>>>>       of both might be observable.
Thus, if both might have their
>>>>>>>>>>> address
>>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>>       other than the comparisons we
want to fold, we cannot proceed.
>>>>>>>>>>>
>>>>>>>>>>> 2) We define lifetime markers to
mean `memset(undef)`.
>>>>>>>>>>>
>>>>>>>>>>> The first should be sufficient for
the problem you were trying to solve
>>>>>>>>>>> in the
>>>>>>>>>>> first place. The second makes
lifetime markers less weird. Note that 1)
>>>>>>>>>>> is not changing
>>>>>>>>>>> the semantics of the IR. We
basically just argue there is a bug in our
>>>>>>>>>>> instcombine right
>>>>>>>>>>> now as we do not check all
necessary preconditions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I see that there are other
questions that I didn't answer yet, but let
>>>>>>>>> me
>>>>>>>>>>>> answer this first to limit the
length of the text :)
>>>>>>>>>>> Sure, we can split the discussion
:)
>>>>>>>>>>>
>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM
Johannes Doerfert <
>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/5/21 8:00 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I read your proposal
and thought about the model.
>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> As you concerned in A3,
certain programs may be valid only when
>>>>>>>>> memory
>>>>>>>>>>>>>> blocks with overlapping
lifetimes have disjoint addresses.
>>>>>>>>>>>>>> Look at this example
(I'm using malloc, but alloca also works):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>>> p2 = malloc(4) // for
brevity, assume that there as enough space & p1
>>>>>>>>>>> and
>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>>>> s.insert(p1);
s.insert(p2); // If the second insert did nothing, it
>>>>>>>>>>> would
>>>>>>>>>>>>>> be surprise to
programmers
>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Clearly, IR semantics
should guarantee that escaped blocks are
>>>>>>>>>>> disjoint.
>>>>>>>>>>>>> It
>>>>>>>>>>>>>> would be great for
verification tools on LLVM IR to be able to answer
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> the second insert will
succeed.
>>>>>>>>>>>>> I agree, the second insert
should succeed, assuming `p1 && p2`.
>>>>>>>>>>>>> I don't think my
proposal would in any way impact the program above,
>>>>>>>>>>>>> if anything the A3
reasoning makes sure such a program with allocas
>>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm also not sure I
understand what you try to argue for. Maybe
>>>>>>>>>>>>> elaborate a bit what it is
you think is bad or needs to be changed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The problem is that
definition of escapedness is not clear at the
>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>> level. Describing the
IR semantics w.r.t. LLVM's escaped analysis
>>>>>>>>>>>>> function
>>>>>>>>>>>>>> isn't something we
want.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Semantic definition of
escapedness of a pointer seems hard, I mean
>>>>>>>>> in a
>>>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>>>> It isn't a single
instruction such as 'escape i8* ptr', and we need
>>>>>>>>> to
>>>>>>>>>>>>> look
>>>>>>>>>>>>>> over all instructions
in the function. For example, '(int)(p+1) -
>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>> isn't semantically
escaping the pointer p because the result is 1
>>>>>>>>>>>>>> regardless of the value
of p.
>>>>>>>>>>>>>> Stepwisely defining the
semantics of instructions is a desirable
>>>>>>>>>>>>> direction
>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>> I'm confused. What in
the proposal would prevent us from defining
>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an "undesirable
>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In practice,
syntactically checking escapedness + nice engineering
>>>>>>>>>>> might
>>>>>>>>>>>>>> not break optimizations
in most cases (as undef/poison did); but it
>>>>>>>>>>> would
>>>>>>>>>>>>>> be great to move to
another level, since LLVM IR is used in so many
>>>>>>>>>>>>> places
>>>>>>>>>>>>>> :)
>>>>>>>>>>>>> The property under which
you can coalesce objects is simple:
>>>>>>>>>>>>>       It is not observable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, if the address of one
of the two objects you coalesce is not
>>>>>>>>>>>>> observed, coalescing is not
observable. That is a sufficient condition
>>>>>>>>>>>>> not a necessary one.
Pointer "escaping" is one step further. If the
>>>>>>>>>>>>> address doesn't escape
it is not observed. This does not mean the
>>>>>>>>>>>>> "semantic conditions
for coalescing", e.g., for the purpose of
>>>>>>>>>>> translation
>>>>>>>>>>>>> validation, is supposed to
be build on top of our "definition of
>>>>>>>>>>> escaping
>>>>>>>>>>>>> pointers". That said,
we use "does not escape" as a precondition for
>>>>>>>>>>>>> various transformation and
I'm unsure what is any different now. The
>>>>>>>>>>>>> entire escape argument is
only used in the validity of the pointer
>>>>>>>>>>> folding.
>>>>>>>>>>>>> Similarly, we can fold a
comparison of a noalias pointer with another
>>>>>>>>>>> one
>>>>>>>>>>>>> if the former does not
escape (and both are dereferenced and one is
>>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The pointer comparison
is another beast. It indeed has a few issues,
>>>>>>>>>>> and
>>>>>>>>>>>>>> solving it might
require nontrivial solution.
>>>>>>>>>>>>> I think the main problem of
the inconsistencies and such we've seen is
>>>>>>>>>>>>> rooted in the erroneous
folding of pointer comparisons. Cleaning up
>>>>>>>>> the
>>>>>>>>>>>>> lifetime marker semantics
is actually unrelated and simply not folding
>>>>>>>>>>>>> as described in A3 should
solve the issue that has been reported.
>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>> we should take a crack at
lifetime markers while we are here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2021 at
9:37 AM Johannes Doerfert <
>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Happy new year :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> After we had a
lengthy discussion on phab last year, I've tried to
>>>>>>>>>>>>>>> summarize my
thoughts,
>>>>>>>>>>>>>>> especially given
that I had some time to think about things over the
>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>> Still, no promises
on the quality ;)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I start with
general questions I asked myself and then go on
>>>>>>>>> rambling
>>>>>>>>>>>>>>> about a potential
design.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Q1: Is lifetime a
given property or a derived one, thus is it fixed
>>>>>>>>> or
>>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is a question
I asked myself a lot recently. I think it is
>>>>>>>>>>> derived
>>>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>>>> at least I hope it
is. Only that would allow transformations I would
>>>>>>>>>>>>> want
>>>>>>>>>>>>>>> us to do. Here are
some examples:
>>>>>>>>>>>>>>>       
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>       
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Q2: Is a pointer
comparison, or similar use, extending the lifetime?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Asked differently,
can we hoist a pointer comparison into a region
>>>>>>>>>>> where
>>>>>>>>>>>>>>> the pointer is
dead?
>>>>>>>>>>>>>>> This is an
important question which we haven't discussed much as we
>>>>>>>>>>>>>>> assumed LICM has to
work.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The current
behavior is that non-dereferencing uses are not
>>>>>>>>> extending
>>>>>>>>>>>>>>> the lifetime and
are
>>>>>>>>>>>>>>> allowed outside of
"lifetime regions" (as indicated by markers).
>>>>>>>>> They
>>>>>>>>>>>>>>> will always produce
valid
>>>>>>>>>>>>>>> results. Though, we
might want to think about a lifetime marker that
>>>>>>>>>>>>>>> spits out a new
pointer
>>>>>>>>>>>>>>> value instead of
reusing the old one.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Q3: Can we use
lifetime to argue about addresses?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The question here
is, can we fold address comparisons based on
>>>>>>>>>>>>>>> lifetimes, or not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So far, we fold
comparisons based on "address information". For
>>>>>>>>>>> example,
>>>>>>>>>>>>>>> we "know"
globals,
>>>>>>>>>>>>>>> allocas, and
mallocs cannot be equal to one another. Also, two
>>>>>>>>>>> distinct
>>>>>>>>>>>>>>> allocations, for
globals
>>>>>>>>>>>>>>> and allocas, are
considered unequal. Now, the crux is that we have
>>>>>>>>> to
>>>>>>>>>>> be
>>>>>>>>>>>>>>> consistent if we do
two
>>>>>>>>>>>>>>> comparisons, and,
as of now, we are not (bug number missing). Since
>>>>>>>>>>> the
>>>>>>>>>>>>>>> backend (or any
other place
>>>>>>>>>>>>>>> for that matter)
might coalesce allocas, their addresses might not
>>>>>>>>> be
>>>>>>>>>>>>>>> different after
all. If we
>>>>>>>>>>>>>>> already folded a
comparison to "unequal" we are doomed if we later
>>>>>>>>>>> have
>>>>>>>>>>>>>>> a comparison that
results
>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing rules as they can
>>>>>>>>>>> be
>>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would hope we can
come up with a model that treats memory "the
>>>>>>>>>>> same",
>>>>>>>>>>>>>>> regardless if it is
global,
>>>>>>>>>>>>>>> stack, or heap. I
want to avoid special casing one of them wrt.
>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>> as I believe most
optimizations
>>>>>>>>>>>>>>> would apply to any
of them, potentially for different reasons and
>>>>>>>>> with
>>>>>>>>>>>>>>> different gains,
but nevertheless.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Proposal (largely
based on the conversation in phab):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A1: Lifetime is a
concept that talks about memory content *only*.
>>>>>>>>>>>>>>> Basically, the
memory content is set to
>>>>>>>>>>>>>>>          undefined
by lifetime markers. It is derived/modifiable as
>>>>>>>>> it
>>>>>>>>>>> is
>>>>>>>>>>>>>>> just an
"as-is" property of the memory
>>>>>>>>>>>>>>>          content.
The lifetimes of an object, as described by
>>>>>>>>> markers,
>>>>>>>>>>>>> might
>>>>>>>>>>>>>>> change during the
compilation. They
>>>>>>>>>>>>>>>          might
become smaller if we deduce the object is not
>>>>>>>>>>>>>>> accessed
>>>>>>>>>>> and
>>>>>>>>>>>>>>> the memory content
is not used, they
>>>>>>>>>>>>>>>          might
become larger if objects with non-overlapping
>>>>>>>>> lifetimes
>>>>>>>>>>> are
>>>>>>>>>>>>>>> coalesced. (One
could see the latter as
>>>>>>>>>>>>>>>         
introducing a new object though.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A2: If we define
lifetime as above, it has nothing to do with the
>>>>>>>>>>>>>>> address of an
object. Consequently, pointer
>>>>>>>>>>>>>>>         
comparisons and similar operations are valid outside the
>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>> Loads and stores
are as well, they can
>>>>>>>>>>>>>>>          even not
be removed "easily". A store followed by a
>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>> or a load following
a lifetime marker
>>>>>>>>>>>>>>>          is dead or
results in undef respectively.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A3: We could not
use lifetime to argue about addresses. This means
>>>>>>>>> we
>>>>>>>>>>>>>>> could/should also
not argue that overlapping
>>>>>>>>>>>>>>>          lifetimes
result in different addresses. Thus, a comparison
>>>>>>>>>>>>> between
>>>>>>>>>>>>>>> the address of two
allocas could not
>>>>>>>>>>>>>>>         
immediately be folded. That said, there would be special
>>>>>>>>> cases
>>>>>>>>>>>>>>> though. Basically,
if one of the allocas does
>>>>>>>>>>>>>>>          not
escape, other than the comparisons to be folded, we can
>>>>>>>>>>> fold
>>>>>>>>>>>>>>> them. Afterwards,
coalescing or splitting
>>>>>>>>>>>>>>>          would
still be consistent because it is unobservable. The
>>>>>>>>>>> problem
>>>>>>>>>>>>>>> we have in-tree is
that we fold even though
>>>>>>>>>>>>>>>          the
address is still observed (after the fold). It is still
>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>> to me what the
impact of this would be
>>>>>>>>>>>>>>>          on real
code. I suggested before that we run some
>>>>>>>>> experiments
>>>>>>>>>>>>> first
>>>>>>>>>>>>>>> before we make any
decision whatsoever.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is pretty much
saying that lifetime markers are
>>>>>>>>> `memset(undef)`,
>>>>>>>>>>> as
>>>>>>>>>>>>>>> you suggested
before (I think).
>>>>>>>>>>>>>>> There are some
implementation level differences but at the end of
>>>>>>>>> the
>>>>>>>>>>>>>>> day they are
basically the same.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Happy to hear some
thoughts on this, especially if it fixes the
>>>>>>>>>>> problems
>>>>>>>>>>>>>>> that lead to D93376
in the first place.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 12/18/20 2:42
AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We're
discussing the well-formedness of lifetime.start/end
>>>>>>>>> intrinsic
>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>> (syntactically
>>>>>>>>>>> &
>>>>>>>>>>>>>>>> semantically)
valid use of these intrinsics.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to
gather more context about the intrinsics.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Is there a
frontend or framework that introduces lifetime call
>>>>>>>>>>> with
>>>>>>>>>>>>>>>> non-stack
allocated objects?
>>>>>>>>>>>>>>>> If exists,
which behavior do they expect from it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. Can a
block's lifetime be started bytewise or elementwise?
>>>>>>>>>>>>>>>> I imagine an
optimization that allow using stack very compactly,
>>>>>>>>> but
>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>> there is a
real-world use case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>> LLVM Developers
mailing list
>>>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>
>>>>
>>
-- 
Website: https://people.mpi-sws.org/~jung/

Johannes Doerfert via llvm-dev

2021-Jan-26 04:33 UTC

head link

[llvm-dev] lifetime.start/end

On 1/24/21 10:07 AM, Ralf Jung wrote:> Hi Johannes,
>
>> At some point I lost track of what you even want to achieve.
>
> *lol* fair.^^
> All I want is a precise and meaningful spec for lifetime.start and 
> lifetime.end. That's where we entered this rabbit hole. You didn't 
> like Juneyoung's and Nuno's proposal for this and instead suggested
to
> replace the entire system by something else, and that new system is 
> what we are discussing now (that is my understanding, anyway).
>I went back to the proposal in question, 
https://reviews.llvm.org/D93376, and I could not right away determine 
why this would avoid the inconsistency we can imagine for something like

```
int *G = 0;
void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
void foo() {
   intptr_t xp, yp;
   { int x; xp = &x; bar(&x); }
   { int y; yp = &y; bar(&y); }
   print(xp == yp);               // One can imagine this is folded to 
false and because x and y are coalesced bar will print true.
}

```

I was opposing the proposal because of the brittle syntactic 
restrictions. In my attempt to justify not having those I also tried to 
resolve the above inconsistency in a more general manner because I 
thought that was what it was all about. I believe that we either have to 
argue the address of a dead object is unusable/undetermined, basically 
the direction of James comment, or we need to decouple lifetime markers 
from locations.


>> Your initial problem goes away if we only coalesce and fold 
>> comparisons between allocations for which the addresses are not 
>> observed.
>> Let's do that and thereby benefit from all the good things we can
get
>> from lifetime markers and coalescing, for alloca/malloc/...
>> Let's not go down the brittle road of syntactic limitations. It 
>> doesn't allow us to do a lot of things and will come with a price
as
>> we continue.
>>
>> FWIW, I think some part of the misconception is that two allocas 
>> "will definitely get different addresses" which you argue
"heavily
>> restricts the possible allocation choices compared to the original C 
>> program".
>> The addresses are different but that is only interesting if you 
>> observe them. Take the C code: `{int a,b, c = (&a == &b); 
>> use(a,c);}`. I can happily coalesce `a` and `b` after the comparison 
>> was folded, or `b` and `c` even if the comparison is still in place. 
>> For neither transformation I need
>> lifetime markers to do it in IR.
>
> Sure, everything in the semantics is up to what the program actually 
> observes. But I was considering the general case, in which it will not 
> be possible to prove that some address is not observed. I expect that 
> to be quite common (certainly, Rust programs really like to take 
> references to local variables and pass them to other functions), and 
> in those cases, not having lifetime.start/end will severely restrict 
> program transformations -- as you observed, even outlining is affected.
>
> This is basically the abstract version of the concrete concern that 
> James raised.
I get the argument that it can hinder optimizations but it is abstract. 
It seems pretty simple to test this out, I mean to restrict the folding 
and the coalescing. The benefit, which I think got ignored here a lot, 
is that it provides a clear path towards deduction of coalescing 
opportunities, that is, coalescing in the middle-end or backend w/o 
lifetime markers. Though, similar to the impact of my proposal, I have 
no idea if that is even needed.

~ Johannes

>
> Kind regards,
> Ralf
>
>> A lot of my arguments around alloca/malloc/... is that we can augment 
>> the information from the frontend by determining if an address is 
>> observed or not. If it is not, we can limit its allocation, coalesce 
>> it, etc. Lifetime markers can help us do this but they are not the 
>> only way.
>> So, basically, we do not need lifetime markers to coalesce 
>> alloca/malloc/... but that they can help.
>
>
>
>>
>> Let's try to get out of this rabbit hole. If you have a problem
with
>> the proposal as outlined in 
>> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html , 
>> please let me know. I assume dealing with observed addresses 
>> different than with non-observede ones (implied by noescape) is the 
>> way forward now.
>>
>> ~ Johannes
>>
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>>
>>>>>> Does that make some sense?
>>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ralf
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes
Doerfert
>>>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>>>>> It is, alloca must have its
integral address decided at its
>>>>>>>>>>> *allocation*
>>>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>>>
>>>>>>>>>>> For example, if 0xFFFF0000 ~
0xFF000000 is already
>>>>>>>>>>> allocated, the new
>>>>>>>>>>> alloca cannot be placed there
unless it is not observed in
>>>>>>>>>>> the future,
>>>>>>>>>>> according to your proposal.
>>>>>>>>>>>
>>>>>>>>>>> But how do we know from the current
state whether the stack
>>>>>>>>>>> variable is
>>>>>>>>>>> going to be observed or not?
>>>>>>>>>> With that argument you could allocate
all but 4 bytes, do a
>>>>>>>>>> malloc(4)
>>>>>>>>>> and know the address of the returned
pointer (assuming it is
>>>>>>>>>> not null).
>>>>>>>>>>
>>>>>>>>>> What I try to say is, either your
scenario is part of the
>>>>>>>>>> model and
>>>>>>>>>> everything we do is broken as you could
"observe" addresses
>>>>>>>>>> passively,
>>>>>>>>>> *or*, the abstract machine we use for
semantic reasoning
>>>>>>>>>> doesn't permit
>>>>>>>>>> the above reasoning. I really hope for
the latter.
>>>>>>>>>>
>>>>>>>>> That's a very valid point..!
>>>>>>>>>
>>>>>>>>> Actually, I have a memory model that
addresses this problem.
>>>>>>>>> The gist is that we can interpret each
allocation instruction
>>>>>>>>> as *creating
>>>>>>>>> 2 blocks* and nondeterministically
returning one of them.
>>>>>>>>> The other one is marked as invalid but not
freed immediately.
>>>>>>>>> Deallocation
>>>>>>>>> frees both blocks.
>>>>>>>>> If there is not enough space for having two
blocks, it is
>>>>>>>>> treated as
>>>>>>>>> out-of-memory.
>>>>>>>>>
>>>>>>>>> It makes guessing the address of an
allocation according to
>>>>>>>>> memory layout
>>>>>>>>> invalid UB.
>>>>>>>>>
>>>>>>>>> p = malloc(4)
>>>>>>>>> // If p != null, it is guaranteed there was
at least two 4
>>>>>>>>> byte slots
>>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>>>>> // Two blocks are allocated at 0x110 and
0x114, and one of the
>>>>>>>>> addresses is
>>>>>>>>> nondeterministically returned.
>>>>>>>>> // By the nature of nondeterminism, all
executions below
>>>>>>>>> should be
>>>>>>>>> well-defined regardless of the address of
p.
>>>>>>>>> *(int*)0x100 = 10; // In an execution where
p is 0x110, this
>>>>>>>>> raises UB.
>>>>>>>>>
>>>>>>>>> The paper link is here 
>>>>>>>>>
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>
>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>
>>>>>>>>
>>>>>>>>> To argue differently: Who is to say there
is a stack, or only
>>>>>>>>> one,
>>>>>>>>>> or that alloca allocates memory
"on the one stack"? That is
>>>>>>>>>> not part of
>>>>>>>>>> the IR, IMHO. I can write a machine on
which alloca lowers to
>>>>>>>>>> malloc,
>>>>>>>>>> I don't even need the free during
stack unwind but that I
>>>>>>>>>> could do as
>>>>>>>>>> well if I wanted to.
>>>>>>>>>
>>>>>>>>> This is right, the example was for
illustrative purposes. IR
>>>>>>>>> does not
>>>>>>>>> enforce alloca to be placed at
'stack'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Making icmp/ptrtoint yield poison
will still make loop
>>>>>>>>>>> versioning or
>>>>>>>>>>> pointer rewriting transformations
unsound because these
>>>>>>>>>>> operations now
>>>>>>>>>> can
>>>>>>>>>>> create poison (even if pointers are
noundef).
>>>>>>>>>> I did not say they yield poison, at
least I did not try to
>>>>>>>>>> say that.
>>>>>>>>>> What are you referring to exactly?
>>>>>>>>>>
>>>>>>>>> I was referring to this:
>>>>>>>>>
>>>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions,
>>>>>>>>>>> which
>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>> As with other violation of attributes I
would, on first
>>>>>>>>>> though, suggest
>>>>>>>>>> to produce poison, not UB.
>>>>>>>>> But it is more about the (imaginary)
attribute, so maybe I was
>>>>>>>>> slightly out
>>>>>>>>> of topic.
>>>>>>>>>
>>>>>>>>>> 2) is fine, I think the suggestion
semantically makes sense
>>>>>>>>>> perfectly. 1)
>>>>>>>>>>> is something I'm concerned
about now.
>>>>>>>>>>>
>>>>>>>>>>> There are more than pointer
foldings, such as rewriting such
>>>>>>>>>>> expression,
>>>>>>>>>>> code motion ptr cmp, introduce ptr
cmp, etc. There is also
>>>>>>>>>>> analysis
>>>>>>>>>> relying
>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>> Defining the correctness of each of
them is something we
>>>>>>>>>>> want to avoid,
>>>>>>>>>> and
>>>>>>>>>>> maybe that's why we want to
define precise semantics for
>>>>>>>>>>> things.
>>>>>>>>>> I don't get the point. My proposal
does not change the
>>>>>>>>>> semantics of
>>>>>>>>>> pointer comparisons, at all. I
explicitly mentioned that in
>>>>>>>>>> the last
>>>>>>>>>> email.
>>>>>>>>>>
>>>>>>>>> Oh okay, I thought it was a part of the
lifetime proposal, but
>>>>>>>>> it seems
>>>>>>>>> more like a separate thing.
>>>>>>>>> I agree that this requires performance
impact.
>>>>>>>>> Also investigation of existing
transformations would be
>>>>>>>>> needed; Alive2's
>>>>>>>>> pointer comparison is doing approximation
yet, but if it
>>>>>>>>> becomes fully
>>>>>>>>> precise, it will show something from
running LLVM unit tests I
>>>>>>>>> believe..! :)
>>>>>>>>>
>>>>>>>>>> I think this will be less aggressive
and may give nice
>>>>>>>>>> feedback to
>>>>>>>>>>> potential projects that are using
lifetime with non-alloca.
>>>>>>>>>> The lifetime marker debate, basically
2) above, is orthogonal
>>>>>>>>>> to the
>>>>>>>>>> problem
>>>>>>>>>> you try to solve. It got mixed in as
lifetime markers were
>>>>>>>>>> used by
>>>>>>>>>> StackColoring
>>>>>>>>>> to perform coalescing but that is
coincidental. You can
>>>>>>>>>> (arguably)
>>>>>>>>>> coalesce stack
>>>>>>>>>> allocations regardless of lifetime
markers and with 1) such a
>>>>>>>>>> transformation
>>>>>>>>>> (w/ and w/o lifetime markers) would
actually be sound.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> At the end, we can implement IR
writer that lowers lifetime
>>>>>>>>>>> with
>>>>>>>>>> non-alloca
>>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>>> Yeah, 2) is orthogonal and we can lower
it that way. Unsure
>>>>>>>>>> if it is
>>>>>>>>>> helpful
>>>>>>>>>> but we can certainly define it that way
in the LangRef.
>>>>>>>>>>
>>>>>>>>> Okay, thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Juneyoung
>>>>>>>>>>>
>>>>>>>>>>> p.s. The reply was late, sorry. I
think I can spend more
>>>>>>>>>>> time on this
>>>>>>>>>> today.
>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM
Johannes Doerfert <
>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung
Lee wrote:
>>>>>>>>>>>>>>> Stepwisely defining
the semantics of instructions is a
>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>> I'm confused. What
in the proposal would prevent us from
>>>>>>>>>>>>>> defining
>>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an
>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I meant it would be great
if the output state after
>>>>>>>>>>>>> executing an
>>>>>>>>>>>>> instruction can be
described using its input state.
>>>>>>>>>>>>> (that was the meaning of
'stepwise semantics', I should
>>>>>>>>>>>>> have been more
>>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, the semantics
of 'z = add x y' can be defined
>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>> Given an input state s,
next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>> where s(x) is the value of
x in the previous state, and
>>>>>>>>>>>>> s[z -> v] is a
>>>>>>>>>>>>> state with z updated to v.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another example that
involves memory: the semantics of 'i
>>>>>>>>>>>>> = ptrtoint p'
>>>>>>>>>>>> can
>>>>>>>>>>>>> be defined as follows:
>>>>>>>>>>>>> Given an input state s,
next state s' = s[i ->
>>>>>>>>>>>>> s(p).obj.address +
>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>> where obj.address is the
begin address of a memory object
>>>>>>>>>>>>> obj pointed
>>>>>>>>>> by
>>>>>>>>>>>> p
>>>>>>>>>>>>> & offset is p's
byte offset. (Imagine a pointer to the
>>>>>>>>>>>>> offset of some
>>>>>>>>>>>> char
>>>>>>>>>>>>> array).
>>>>>>>>>>>>> Note that ptrtoint &
add can be nicely defined w.r.t the
>>>>>>>>>>>>> input state.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, the instruction that
we're facing is 'p = alloca'.
>>>>>>>>>>>>> To describe the output
state after executing 'p = alloca',
>>>>>>>>>>>>> the address
>>>>>>>>>> of
>>>>>>>>>>>>> new alloca should be
determined.
>>>>>>>>>>>>> If observedness is
involved, we need to know the future
>>>>>>>>>>>>> state again. :/
>>>>>>>>>>>> We
>>>>>>>>>>>>> don't know whether the
alloca is going to be observed or
>>>>>>>>>>>>> not without
>>>>>>>>>>>> seeing
>>>>>>>>>>>>> the future.
>>>>>>>>>>>>> This is the problem of the
current lifetime intrinsics as
>>>>>>>>>>>>> well.
>>>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>>>
>>>>>>>>>>>> Nobody proposed to modify the
semantics of `alloca`.
>>>>>>>>>>>> `alloca` provides you with a
fresh, unobserved block of
>>>>>>>>>>>> dereferenceable memory that is
implicitly freed as the stack
>>>>>>>>>>>> unwinds. That is it. No context
necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> If you want to modify the IR,
you need to argue the observable
>>>>>>>>>>>> semantics which is nothing new.
That this might require
>>>>>>>>>>>> more than
>>>>>>>>>>>> a peephole view of the program
is also not new.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> One possible approach to
resolve this is adding an
>>>>>>>>>>>>> 'unobserved' flag
to
>>>>>>>>>>>>> alloca instruction (similar
to what was suggested by
>>>>>>>>>>>>> Nicolai).
>>>>>>>>>>>>> And, we can say that if
alloca with 'unobserved' is used by
>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>> The flag can be added, like we
add other attributes. It
>>>>>>>>>>>> should not
>>>>>>>>>>>> be required for any
optimization we talked about though. It
>>>>>>>>>>>> basically
>>>>>>>>>>>> is a way to manifest derived or
given information into the IR.
>>>>>>>>>>>>
>>>>>>>>>>>> Attribute deduction, as well as
frontends with domain
>>>>>>>>>>>> knowledge,
>>>>>>>>>>>> can add such information. The
flag we discussed in phab was
>>>>>>>>>>>> not even
>>>>>>>>>>>> sufficient for all the
transformation examples I presented
>>>>>>>>>>>> in my mail,
>>>>>>>>>>>> that is why I extended my 
argument. We could still have a
>>>>>>>>>>>> "noescape"
>>>>>>>>>>>> flag for allocas, but I'm
not sure how useful that really
>>>>>>>>>>>> is. We can
>>>>>>>>>>>> certainly deduce it and
manifest it, unsure if we have
>>>>>>>>>>>> domain knowledge
>>>>>>>>>>>> we can use for non-trivial
cases though.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> contradicts with what LLVM
does.
>>>>>>>>>>>> As with other violation of
attributes I would, on first
>>>>>>>>>>>> though, suggest
>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, existing
optimizations like loop versioning can
>>>>>>>>>>>>> introduce
>>>>>>>>>>>>> ptrtoint/pointer
comparisons too.
>>>>>>>>>>>> Sure. I am not certain why that
is a problem. I get the
>>>>>>>>>>>> feeling things
>>>>>>>>>>>> are still mixed up here.
>>>>>>>>>>>>
>>>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) We stop folding comparisons
between different allocas if
>>>>>>>>>>>> changing the
>>>>>>>>>>>> address
>>>>>>>>>>>>       of both might be
observable. Thus, if both might have
>>>>>>>>>>>> their address
>>>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>>>       other than the
comparisons we want to fold, we cannot
>>>>>>>>>>>> proceed.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) We define lifetime markers
to mean `memset(undef)`.
>>>>>>>>>>>>
>>>>>>>>>>>> The first should be sufficient
for the problem you were
>>>>>>>>>>>> trying to solve
>>>>>>>>>>>> in the
>>>>>>>>>>>> first place. The second makes
lifetime markers less weird.
>>>>>>>>>>>> Note that 1)
>>>>>>>>>>>> is not changing
>>>>>>>>>>>> the semantics of the IR. We
basically just argue there is a
>>>>>>>>>>>> bug in our
>>>>>>>>>>>> instcombine right
>>>>>>>>>>>> now as we do not check all
necessary preconditions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I see that there are other
questions that I didn't answer
>>>>>>>>>>>>> yet, but let
>>>>>>>>>> me
>>>>>>>>>>>>> answer this first to limit
the length of the text :)
>>>>>>>>>>>> Sure, we can split the
discussion :)
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36
AM Johannes Doerfert <
>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/5/21 8:00 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I read your
proposal and thought about the model.
>>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As you concerned in
A3, certain programs may be valid
>>>>>>>>>>>>>>> only when
>>>>>>>>>> memory
>>>>>>>>>>>>>>> blocks with
overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>> Look at this
example (I'm using malloc, but alloca also
>>>>>>>>>>>>>>> works):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>>>> p2 = malloc(4) //
for brevity, assume that there as
>>>>>>>>>>>>>>> enough space &
p1
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>>>>> s.insert(p1);
s.insert(p2); // If the second insert did
>>>>>>>>>>>>>>> nothing, it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be surprise to
programmers
>>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Clearly, IR
semantics should guarantee that escaped
>>>>>>>>>>>>>>> blocks are
>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> would be great for
verification tools on LLVM IR to be
>>>>>>>>>>>>>>> able to answer
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the second insert
will succeed.
>>>>>>>>>>>>>> I agree, the second
insert should succeed, assuming `p1
>>>>>>>>>>>>>> && p2`.
>>>>>>>>>>>>>> I don't think my
proposal would in any way impact the
>>>>>>>>>>>>>> program above,
>>>>>>>>>>>>>> if anything the A3
reasoning makes sure such a program
>>>>>>>>>>>>>> with allocas
>>>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm also not sure I
understand what you try to argue for.
>>>>>>>>>>>>>> Maybe
>>>>>>>>>>>>>> elaborate a bit what it
is you think is bad or needs to
>>>>>>>>>>>>>> be changed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is that
definition of escapedness is not
>>>>>>>>>>>>>>> clear at the
>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>> level. Describing
the IR semantics w.r.t. LLVM's escaped
>>>>>>>>>>>>>>> analysis
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> isn't something
we want.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Semantic definition
of escapedness of a pointer seems
>>>>>>>>>>>>>>> hard, I mean
>>>>>>>>>> in a
>>>>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>>>>> It isn't a
single instruction such as 'escape i8* ptr',
>>>>>>>>>>>>>>> and we need
>>>>>>>>>> to
>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>> over all
instructions in the function. For example,
>>>>>>>>>>>>>>> '(int)(p+1) -
>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>> isn't
semantically escaping the pointer p because the
>>>>>>>>>>>>>>> result is 1
>>>>>>>>>>>>>>> regardless of the
value of p.
>>>>>>>>>>>>>>> Stepwisely defining
the semantics of instructions is a
>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>> I'm confused. What
in the proposal would prevent us from
>>>>>>>>>>>>>> defining
>>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an
>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In practice,
syntactically checking escapedness + nice
>>>>>>>>>>>>>>> engineering
>>>>>>>>>>>> might
>>>>>>>>>>>>>>> not break
optimizations in most cases (as undef/poison
>>>>>>>>>>>>>>> did); but it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be great to move to
another level, since LLVM IR is used
>>>>>>>>>>>>>>> in so many
>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>> The property under
which you can coalesce objects is simple:
>>>>>>>>>>>>>>       It is not
observable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, if the address of
one of the two objects you
>>>>>>>>>>>>>> coalesce is not
>>>>>>>>>>>>>> observed, coalescing is
not observable. That is a
>>>>>>>>>>>>>> sufficient condition
>>>>>>>>>>>>>> not a necessary one.
Pointer "escaping" is one step
>>>>>>>>>>>>>> further. If the
>>>>>>>>>>>>>> address doesn't
escape it is not observed. This does not
>>>>>>>>>>>>>> mean the
>>>>>>>>>>>>>> "semantic
conditions for coalescing", e.g., for the
>>>>>>>>>>>>>> purpose of
>>>>>>>>>>>> translation
>>>>>>>>>>>>>> validation, is supposed
to be build on top of our
>>>>>>>>>>>>>> "definition of
>>>>>>>>>>>> escaping
>>>>>>>>>>>>>> pointers". That
said, we use "does not escape" as a
>>>>>>>>>>>>>> precondition for
>>>>>>>>>>>>>> various transformation
and I'm unsure what is any
>>>>>>>>>>>>>> different now. The
>>>>>>>>>>>>>> entire escape argument
is only used in the validity of
>>>>>>>>>>>>>> the pointer
>>>>>>>>>>>> folding.
>>>>>>>>>>>>>> Similarly, we can fold
a comparison of a noalias pointer
>>>>>>>>>>>>>> with another
>>>>>>>>>>>> one
>>>>>>>>>>>>>> if the former does not
escape (and both are dereferenced
>>>>>>>>>>>>>> and one is
>>>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The pointer
comparison is another beast. It indeed has a
>>>>>>>>>>>>>>> few issues,
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> solving it might
require nontrivial solution.
>>>>>>>>>>>>>> I think the main
problem of the inconsistencies and such
>>>>>>>>>>>>>> we've seen is
>>>>>>>>>>>>>> rooted in the erroneous
folding of pointer comparisons.
>>>>>>>>>>>>>> Cleaning up
>>>>>>>>>> the
>>>>>>>>>>>>>> lifetime marker
semantics is actually unrelated and
>>>>>>>>>>>>>> simply not folding
>>>>>>>>>>>>>> as described in A3
should solve the issue that has been
>>>>>>>>>>>>>> reported.
>>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>>> we should take a crack
at lifetime markers while we are
>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2021
at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy new year
:)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After we had a
lengthy discussion on phab last year,
>>>>>>>>>>>>>>>> I've tried
to
>>>>>>>>>>>>>>>> summarize my
thoughts,
>>>>>>>>>>>>>>>> especially
given that I had some time to think about
>>>>>>>>>>>>>>>> things over the
>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>> Still, no
promises on the quality ;)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I start with
general questions I asked myself and then
>>>>>>>>>>>>>>>> go on
>>>>>>>>>> rambling
>>>>>>>>>>>>>>>> about a
potential design.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q1: Is lifetime
a given property or a derived one, thus
>>>>>>>>>>>>>>>> is it fixed
>>>>>>>>>> or
>>>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a
question I asked myself a lot recently. I
>>>>>>>>>>>>>>>> think it is
>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>>>>> at least I hope
it is. Only that would allow
>>>>>>>>>>>>>>>> transformations
I would
>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>> us to do. Here
are some examples:
>>>>>>>>>>>>>>>>       
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>       
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q2: Is a
pointer comparison, or similar use, extending
>>>>>>>>>>>>>>>> the lifetime?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Asked
differently, can we hoist a pointer comparison
>>>>>>>>>>>>>>>> into a region
>>>>>>>>>>>> where
>>>>>>>>>>>>>>>> the pointer is
dead?
>>>>>>>>>>>>>>>> This is an
important question which we haven't
>>>>>>>>>>>>>>>> discussed much
as we
>>>>>>>>>>>>>>>> assumed LICM
has to work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The current
behavior is that non-dereferencing uses are
>>>>>>>>>>>>>>>> not
>>>>>>>>>> extending
>>>>>>>>>>>>>>>> the lifetime
and are
>>>>>>>>>>>>>>>> allowed outside
of "lifetime regions" (as indicated by
>>>>>>>>>>>>>>>> markers).
>>>>>>>>>> They
>>>>>>>>>>>>>>>> will always
produce valid
>>>>>>>>>>>>>>>> results.
Though, we might want to think about a
>>>>>>>>>>>>>>>> lifetime marker
that
>>>>>>>>>>>>>>>> spits out a new
pointer
>>>>>>>>>>>>>>>> value instead
of reusing the old one.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q3: Can we use
lifetime to argue about addresses?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The question
here is, can we fold address comparisons
>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>> lifetimes, or
not.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So far, we fold
comparisons based on "address
>>>>>>>>>>>>>>>>
information". For
>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>> we
"know" globals,
>>>>>>>>>>>>>>>> allocas, and
mallocs cannot be equal to one another.
>>>>>>>>>>>>>>>> Also, two
>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>> allocations,
for globals
>>>>>>>>>>>>>>>> and allocas,
are considered unequal. Now, the crux is
>>>>>>>>>>>>>>>> that we have
>>>>>>>>>> to
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> consistent if
we do two
>>>>>>>>>>>>>>>> comparisons,
and, as of now, we are not (bug number
>>>>>>>>>>>>>>>> missing). Since
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> backend (or any
other place
>>>>>>>>>>>>>>>> for that
matter) might coalesce allocas, their
>>>>>>>>>>>>>>>> addresses might
not
>>>>>>>>>> be
>>>>>>>>>>>>>>>> different after
all. If we
>>>>>>>>>>>>>>>> already folded
a comparison to "unequal" we are doomed
>>>>>>>>>>>>>>>> if we later
>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> a comparison
that results
>>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing
>>>>>>>>>>>>>>>> rules as they
can
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would hope we
can come up with a model that treats
>>>>>>>>>>>>>>>> memory
"the
>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>> regardless if
it is global,
>>>>>>>>>>>>>>>> stack, or heap.
I want to avoid special casing one of
>>>>>>>>>>>>>>>> them wrt.
>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>> as I believe
most optimizations
>>>>>>>>>>>>>>>> would apply to
any of them, potentially for different
>>>>>>>>>>>>>>>> reasons and
>>>>>>>>>> with
>>>>>>>>>>>>>>>> different
gains, but nevertheless.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Proposal
(largely based on the conversation in phab):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A1: Lifetime is
a concept that talks about memory
>>>>>>>>>>>>>>>> content *only*.
>>>>>>>>>>>>>>>> Basically, the
memory content is set to
>>>>>>>>>>>>>>>>         
undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>> it
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> just an
"as-is" property of the memory
>>>>>>>>>>>>>>>>         
content. The lifetimes of an object, as
>>>>>>>>>>>>>>>> described by
>>>>>>>>>> markers,
>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>> change during
the compilation. They
>>>>>>>>>>>>>>>>          might
become smaller if we deduce the object
>>>>>>>>>>>>>>>> is not accessed
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> the memory
content is not used, they
>>>>>>>>>>>>>>>>          might
become larger if objects with
>>>>>>>>>>>>>>>> non-overlapping
>>>>>>>>>> lifetimes
>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> coalesced. (One
could see the latter as
>>>>>>>>>>>>>>>>         
introducing a new object though.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A2: If we
define lifetime as above, it has nothing to
>>>>>>>>>>>>>>>> do with the
>>>>>>>>>>>>>>>> address of an
object. Consequently, pointer
>>>>>>>>>>>>>>>>         
comparisons and similar operations are valid
>>>>>>>>>>>>>>>> outside the
>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>> Loads and
stores are as well, they can
>>>>>>>>>>>>>>>>          even
not be removed "easily". A store followed
>>>>>>>>>>>>>>>> by a lifetime
>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>> or a load
following a lifetime marker
>>>>>>>>>>>>>>>>          is
dead or results in undef respectively.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A3: We could
not use lifetime to argue about addresses.
>>>>>>>>>>>>>>>> This means
>>>>>>>>>> we
>>>>>>>>>>>>>>>> could/should
also not argue that overlapping
>>>>>>>>>>>>>>>>         
lifetimes result in different addresses. Thus,
>>>>>>>>>>>>>>>> a comparison
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>> the address of
two allocas could not
>>>>>>>>>>>>>>>>         
immediately be folded. That said, there would
>>>>>>>>>>>>>>>> be special
>>>>>>>>>> cases
>>>>>>>>>>>>>>>> though.
Basically, if one of the allocas does
>>>>>>>>>>>>>>>>          not
escape, other than the comparisons to be
>>>>>>>>>>>>>>>> folded, we can
>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>> them.
Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>          would
still be consistent because it is
>>>>>>>>>>>>>>>> unobservable.
The
>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> we have in-tree
is that we fold even though
>>>>>>>>>>>>>>>>          the
address is still observed (after the
>>>>>>>>>>>>>>>> fold). It is
still
>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>> to me what the
impact of this would be
>>>>>>>>>>>>>>>>          on
real code. I suggested before that we run some
>>>>>>>>>> experiments
>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>> before we make
any decision whatsoever.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is pretty
much saying that lifetime markers are
>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> you suggested
before (I think).
>>>>>>>>>>>>>>>> There are some
implementation level differences but at
>>>>>>>>>>>>>>>> the end of
>>>>>>>>>> the
>>>>>>>>>>>>>>>> day they are
basically the same.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy to hear
some thoughts on this, especially if it
>>>>>>>>>>>>>>>> fixes the
>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>> that lead to
D93376 in the first place.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 12/18/20
2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We're
discussing the well-formedness of
>>>>>>>>>>>>>>>>>
lifetime.start/end
>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>> (syntactically
>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd
like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Is there
a frontend or framework that introduces
>>>>>>>>>>>>>>>>> lifetime
call
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> non-stack
allocated objects?
>>>>>>>>>>>>>>>>> If exists,
which behavior do they expect from it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. Can a
block's lifetime be started bytewise or
>>>>>>>>>>>>>>>>>
elementwise?
>>>>>>>>>>>>>>>>> I imagine
an optimization that allow using stack very
>>>>>>>>>>>>>>>>> compactly,
>>>>>>>>>> but
>>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>>> there is a
real-world use case.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>> LLVM
Developers mailing list
>>>>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

llvm dev - Jan 2021 - lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end