thr3ads.net - llvm dev - [llvm-dev] lifetime.start/end [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2021-Jan-13 11:14 UTC

[llvm-dev] lifetime.start/end

Hi Johannes,
>>> The tricky part is, do we see `%c = false` at compile time
>>> or at runtime. If it's the former, we might run into the same
coalescing
>>> issue we have for allocas:
>>>
>>> p = malloc(1)
>>> free(q)
>>> q = malloc(1)
>>> free(p)
>>> %c = icmp eq %p, %q
>>>
>>> What is `%c` now? It could go either way, depending on your
allocator, agreed?
>>
>> Agreed.
>>
>>> If so, we need to ensure
>>> there is not a second `%c2 = icmp eq %p, %q` somewhere if we want
to fold
>>> `%c` as it needs to be consistent.
>>
>> Indeed -- if we constrain the non-deterministic choice that the
allocator is
>> making, we need to do so consistently.
>>
>> On Phabricator, you wrote
>>
>>> We could instead coalesce allocas in the middle end (more
aggressively and
>>> not only based on lifetime markers) in order to fold comparisons of
the "then
>>> equal" allocas to "equal".
>>
>> So, to me this sounded like you wanted to fold `icmp` to `true` in
situations
>> like in my example. It looks like I misunderstood.
>> So is your proposal instead that we can coalesce allocations *after* we
have
>> ensured that there definitely is no `icmp` on their pointer any more?
Like,
>> for local variables, we first fold away all `icmp` and then we can
freely
>> coalesce since now it's not observable that we are reusing memory?
> 
> Let me start with:
> 
> My phab response is "stale" by now. My first email to this thread
was written
> after I had more time to think about all this:
> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
> 
> Separation of concerns, yes. If you can't observe we coalesced, we can
coalesce,
> or choose not to. My main argument is that the folding we do
> right now is broken, or the coalescing is. We cannot fold, coalesce, and
then
> fold again (which could be runtime eval), as that leads to
> inconsistencies.
I think this is equivalent to what I said, namely that we can only coalesce when
there are no comparison checks left -- it's not just that we cannot fold
icmp
any more after colaescing, we cannot have any icmp, folded or not, since 
runtime-checking of ptr equality will yield the wrong result after coalescing.
(I have a hard time conceptualizing runtime eval as a kind of "fold"
the way you
do.^^ So I prefer to think of this in terms of the conditions that have to hold 
when coalescing happens: no icmp left on the involved pointers.)
>> However, if we translate alloca to malloc+free, all (almost all?) of
these
>> allocations *will* overlap in their lifetime, so 'false'
(inequal) is the only
>> thing we could ever fold anything to (and we don't have to worry
about
>> consistency here since this is the only possible choice). I guess after
we did
>> that, if there's nothing else observing these addresses, we could
indeed
>> actually reuse the underlying storage...
>>
>> Am I on the right track?
> 
> I guess initially we would only fold to `inequal`, which is fine, until we
start
> to coalesce locations with non-overlapping lifetime early.
> Though, now that I said this, I'm not even sure we could ever fold
after
> coalescing, thus we could ever coalesce if the addresses (of both)
> are "still observed". Maybe folding is not the problem but
coalescing is, though
> I think both are in the most general sense. I think I can
> fabricate a reasonable flow of transformation in which early *and* partial 
> folding is bad:
> 
> a = alloc
> b = alloc
> ... // no use of a or b
> foo(a)
> ... // no use of a or b
> bar(b)
> ... // no use of a or b
> c = (a == b)
> 
> now we assume that the two calls are outlined after we fold `(a == b)` to
`false`:
> 
> void foocall() {
> a = alloc
> foo(a)
> }
> void barcall() {
> a = alloc
> bar(a)
> }
> 
> foocall()
> barcall()
> c = true
> 
> now we assume foo stores away `a` and bar compares it to `b`. That would be
a
> problem.
> 
> WDYT?
Hm, yeah, this shows that even outlining is a problem since it can make 
previously overlapping allocations non-overlapping. In a sense, there is a step 
happening before outlining here, which is to move around the alloca and explicit
free to make the code more like

... // no use of a or b
a = alloc
foo(a)
free a
... // no use of a or b
b = alloc
bar(b)
free b
... // no use of a or b
c = (a == b)

It is this reordering of (de)allocation that is incorrect; the outlining itself 
is not the problem.

This also reflects my general concern with your approach: there's a lot of 
information loss. When I write C (or Rust) code, there is no guarantee that 
different local variables have different addresses. And yet when I translate 
that code to LLVM by doing

a = alloca
b = alloca
[...]

now I have generated LLVM IR where a and b will definitely get different 
addresses (if alloca works anything like a regular allocation operation). This 
is a correct translation from C to LLVM IR, but it heavily restricts the 
possible allocation choices compared to the original C program. So at this point
frontends would be well-advised to not generate code like this, but instead 
carefully place the alloca as late as possible. Frontends will also want to have
an explicit "free" for stack-slots to indicate when they are not
needed any more.
It is my understanding that this is exactly where the lifetime.start and 
lifetime.end markers come in: they are *prescriptive* (in the terminology of 
your email: they are a given property, and thus fixed -- modulo the usual
"as
if" rule, of course); they are how the frontend informs the later stages of
the
constraints on which alloca may overlap and which may not. If the lifetime 
markers cannot play this role, then something else should. There needs to be 
*some* way for the frontend to encode this information.

Kind regards,
Ralf

> 
> 
>>
>> Kind regards,
>> Ralf
>>
>>>
>>> Does that make some sense?
>>>
>>> ~ Johannes
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Ralf
>>>>
>>>>>
>>>>>
>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert
<johannesdoerfert at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>> It is, alloca must have its integral address
decided at its *allocation*
>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>
>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is
already allocated, the new
>>>>>>>> alloca cannot be placed there unless it is not
observed in the future,
>>>>>>>> according to your proposal.
>>>>>>>>
>>>>>>>> But how do we know from the current state
whether the stack variable is
>>>>>>>> going to be observed or not?
>>>>>>> With that argument you could allocate all but 4
bytes, do a malloc(4)
>>>>>>> and know the address of the returned pointer
(assuming it is not null).
>>>>>>>
>>>>>>> What I try to say is, either your scenario is part
of the model and
>>>>>>> everything we do is broken as you could
"observe" addresses passively,
>>>>>>> *or*, the abstract machine we use for semantic
reasoning doesn't permit
>>>>>>> the above reasoning. I really hope for the latter.
>>>>>>>
>>>>>> That's a very valid point..!
>>>>>>
>>>>>> Actually, I have a memory model that addresses this
problem.
>>>>>> The gist is that we can interpret each allocation
instruction as *creating
>>>>>> 2 blocks* and nondeterministically returning one of
them.
>>>>>> The other one is marked as invalid but not freed
immediately. Deallocation
>>>>>> frees both blocks.
>>>>>> If there is not enough space for having two blocks, it
is treated as
>>>>>> out-of-memory.
>>>>>>
>>>>>> It makes guessing the address of an allocation
according to memory layout
>>>>>> invalid UB.
>>>>>>
>>>>>> p = malloc(4)
>>>>>> // If p != null, it is guaranteed there was at least
two 4 byte slots
>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>> // Two blocks are allocated at 0x110 and 0x114, and one
of the addresses is
>>>>>> nondeterministically returned.
>>>>>> // By the nature of nondeterminism, all executions
below should be
>>>>>> well-defined regardless of the address of p.
>>>>>> *(int*)0x100 = 10; // In an execution where p is 0x110,
this raises UB.
>>>>>>
>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>
>>>>> Nice! Thanks for the link :)
>>>>>
>>>>>
>>>>>> To argue differently: Who is to say there is a stack,
or only one,
>>>>>>> or that alloca allocates memory "on the one
stack"? That is not part of
>>>>>>> the IR, IMHO. I can write a machine on which alloca
lowers to malloc,
>>>>>>> I don't even need the free during stack unwind
but that I could do as
>>>>>>> well if I wanted to.
>>>>>>
>>>>>> This is right, the example was for illustrative
purposes. IR does not
>>>>>> enforce alloca to be placed at 'stack'.
>>>>>>
>>>>>>
>>>>>>>> Making icmp/ptrtoint yield poison will still
make loop versioning or
>>>>>>>> pointer rewriting transformations unsound
because these operations now
>>>>>>> can
>>>>>>>> create poison (even if pointers are noundef).
>>>>>>> I did not say they yield poison, at least I did not
try to say that.
>>>>>>> What are you referring to exactly?
>>>>>>>
>>>>>> I was referring to this:
>>>>>>
>>>>>>>> But this makes ptrtoint/icmp make UB-raising
instructions, which
>>>>>>>> contradicts with what LLVM does.
>>>>>>> As with other violation of attributes I would, on
first though, suggest
>>>>>>> to produce poison, not UB.
>>>>>> But it is more about the (imaginary) attribute, so
maybe I was slightly out
>>>>>> of topic.
>>>>>>
>>>>>>> 2) is fine, I think the suggestion semantically
makes sense perfectly. 1)
>>>>>>>> is something I'm concerned about now.
>>>>>>>>
>>>>>>>> There are more than pointer foldings, such as
rewriting such expression,
>>>>>>>> code motion ptr cmp, introduce ptr cmp, etc.
There is also analysis
>>>>>>> relying
>>>>>>>> on ptr cmp.
>>>>>>>> Defining the correctness of each of them is
something we want to avoid,
>>>>>>> and
>>>>>>>> maybe that's why we want to define precise
semantics for things.
>>>>>>> I don't get the point. My proposal does not
change the semantics of
>>>>>>> pointer comparisons, at all. I explicitly mentioned
that in the last
>>>>>>> email.
>>>>>>>
>>>>>> Oh okay, I thought it was a part of the lifetime
proposal, but it seems
>>>>>> more like a separate thing.
>>>>>> I agree that this requires performance impact.
>>>>>> Also investigation of existing transformations would be
needed; Alive2's
>>>>>> pointer comparison is doing approximation yet, but if
it becomes fully
>>>>>> precise, it will show something from running LLVM unit
tests I believe..! :)
>>>>>>
>>>>>>> I think this will be less aggressive and may give
nice feedback to
>>>>>>>> potential projects that are using lifetime with
non-alloca.
>>>>>>> The lifetime marker debate, basically 2) above, is
orthogonal to the
>>>>>>> problem
>>>>>>> you try to solve. It got mixed in as lifetime
markers were used by
>>>>>>> StackColoring
>>>>>>> to perform coalescing but that is coincidental. You
can (arguably)
>>>>>>> coalesce stack
>>>>>>> allocations regardless of lifetime markers and with
1) such a
>>>>>>> transformation
>>>>>>> (w/ and w/o lifetime markers) would actually be
sound.
>>>>>>>
>>>>>>>
>>>>>>>> At the end, we can implement IR writer that
lowers lifetime with
>>>>>>> non-alloca
>>>>>>>> into memset(undef). WDYT?
>>>>>>> Yeah, 2) is orthogonal and we can lower it that
way. Unsure if it is
>>>>>>> helpful
>>>>>>> but we can certainly define it that way in the
LangRef.
>>>>>>>
>>>>>> Okay, thanks!
>>>>>>
>>>>>>
>>>>>>> ~ Johannes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Juneyoung
>>>>>>>>
>>>>>>>> p.s. The reply was late, sorry. I think I can
spend more time on this
>>>>>>> today.
>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes
Doerfert <
>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>>> Stepwisely defining the
semantics of instructions is a desirable
>>>>>>>>>> direction
>>>>>>>>>>>> IMO.
>>>>>>>>>>> I'm confused. What in the
proposal would prevent us from defining
>>>>>>>>>>> the semantics of instructions, or
force us to do it in an "undesirable
>>>>>>>>>> way"?
>>>>>>>>>>
>>>>>>>>>> I meant it would be great if the output
state after executing an
>>>>>>>>>> instruction can be described using its
input state.
>>>>>>>>>> (that was the meaning of 'stepwise
semantics', I should have been more
>>>>>>>>>> clear about this)
>>>>>>>>>>
>>>>>>>>>> For example, the semantics of 'z =
add x y' can be defined as follows:
>>>>>>>>>> Given an input state s, next state
s' = s[z -> s(x) + s(y)]
>>>>>>>>>> where s(x) is the value of x in the
previous state, and s[z -> v] is a
>>>>>>>>>> state with z updated to v.
>>>>>>>>>>
>>>>>>>>>> Another example that involves memory:
the semantics of 'i = ptrtoint p'
>>>>>>>>> can
>>>>>>>>>> be defined as follows:
>>>>>>>>>> Given an input state s, next state
s' = s[i -> s(p).obj.address +
>>>>>>>>>> s(p).offset]
>>>>>>>>>> where obj.address is the begin address
of a memory object obj pointed
>>>>>>> by
>>>>>>>>> p
>>>>>>>>>> & offset is p's byte offset.
(Imagine a pointer to the offset of some
>>>>>>>>> char
>>>>>>>>>> array).
>>>>>>>>>> Note that ptrtoint & add can be
nicely defined w.r.t the input state.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Now, the instruction that we're
facing is 'p = alloca'.
>>>>>>>>>> To describe the output state after
executing 'p = alloca', the address
>>>>>>> of
>>>>>>>>>> new alloca should be determined.
>>>>>>>>>> If observedness is involved, we need to
know the future state again. :/
>>>>>>>>> We
>>>>>>>>>> don't know whether the alloca is
going to be observed or not without
>>>>>>>>> seeing
>>>>>>>>>> the future.
>>>>>>>>>> This is the problem of the current
lifetime intrinsics as well.
>>>>>>>>> No, you mix things up here.
>>>>>>>>>
>>>>>>>>> Nobody proposed to modify the semantics of
`alloca`.
>>>>>>>>> `alloca` provides you with a fresh,
unobserved block of
>>>>>>>>> dereferenceable memory that is implicitly
freed as the stack
>>>>>>>>> unwinds. That is it. No context necessary.
>>>>>>>>>
>>>>>>>>> If you want to modify the IR, you need to
argue the observable
>>>>>>>>> semantics which is nothing new. That this
might require more than
>>>>>>>>> a peephole view of the program is also not
new.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> One possible approach to resolve this
is adding an 'unobserved' flag to
>>>>>>>>>> alloca instruction (similar to what was
suggested by Nicolai).
>>>>>>>>>> And, we can say that if alloca with
'unobserved' is used by
>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>> it is UB.
>>>>>>>>> The flag can be added, like we add other
attributes. It should not
>>>>>>>>> be required for any optimization we talked
about though. It basically
>>>>>>>>> is a way to manifest derived or given
information into the IR.
>>>>>>>>>
>>>>>>>>> Attribute deduction, as well as frontends
with domain knowledge,
>>>>>>>>> can add such information. The flag we
discussed in phab was not even
>>>>>>>>> sufficient for all the transformation
examples I presented in my mail,
>>>>>>>>> that is why I extended my  argument. We
could still have a "noescape"
>>>>>>>>> flag for allocas, but I'm not sure how
useful that really is. We can
>>>>>>>>> certainly deduce it and manifest it, unsure
if we have domain knowledge
>>>>>>>>> we can use for non-trivial cases though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions, which
>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>> As with other violation of attributes I
would, on first though, suggest
>>>>>>>>> to produce poison, not UB.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also, existing optimizations like loop
versioning can introduce
>>>>>>>>>> ptrtoint/pointer comparisons too.
>>>>>>>>> Sure. I am not certain why that is a
problem. I get the feeling things
>>>>>>>>> are still mixed up here.
>>>>>>>>>
>>>>>>>>> What I proposed is twofold:
>>>>>>>>>
>>>>>>>>> 1) We stop folding comparisons between
different allocas if changing the
>>>>>>>>> address
>>>>>>>>>       of both might be observable. Thus, if
both might have their address
>>>>>>>>> "taken"/escaped,
>>>>>>>>>       other than the comparisons we want to
fold, we cannot proceed.
>>>>>>>>>
>>>>>>>>> 2) We define lifetime markers to mean
`memset(undef)`.
>>>>>>>>>
>>>>>>>>> The first should be sufficient for the
problem you were trying to solve
>>>>>>>>> in the
>>>>>>>>> first place. The second makes lifetime
markers less weird. Note that 1)
>>>>>>>>> is not changing
>>>>>>>>> the semantics of the IR. We basically just
argue there is a bug in our
>>>>>>>>> instcombine right
>>>>>>>>> now as we do not check all necessary
preconditions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I see that there are other questions
that I didn't answer yet, but let
>>>>>>> me
>>>>>>>>>> answer this first to limit the length
of the text :)
>>>>>>>>> Sure, we can split the discussion :)
>>>>>>>>>
>>>>>>>>> ~ Johannes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Juneyoung
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM Johannes
Doerfert <
>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung Lee
wrote:
>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>
>>>>>>>>>>>> I read your proposal and
thought about the model.
>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> As you concerned in A3, certain
programs may be valid only when
>>>>>>> memory
>>>>>>>>>>>> blocks with overlapping
lifetimes have disjoint addresses.
>>>>>>>>>>>> Look at this example (I'm
using malloc, but alloca also works):
>>>>>>>>>>>>
>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>> p2 = malloc(4) // for brevity,
assume that there as enough space & p1
>>>>>>>>> and
>>>>>>>>>>>> p2 != null
>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>> s.insert(p1); s.insert(p2); //
If the second insert did nothing, it
>>>>>>>>> would
>>>>>>>>>>>> be surprise to programmers
>>>>>>>>>>>> work(s);
>>>>>>>>>>>> free(data1)
>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>
>>>>>>>>>>>> Clearly, IR semantics should
guarantee that escaped blocks are
>>>>>>>>> disjoint.
>>>>>>>>>>> It
>>>>>>>>>>>> would be great for verification
tools on LLVM IR to be able to answer
>>>>>>>>>>> that
>>>>>>>>>>>> the second insert will succeed.
>>>>>>>>>>> I agree, the second insert should
succeed, assuming `p1 && p2`.
>>>>>>>>>>> I don't think my proposal would
in any way impact the program above,
>>>>>>>>>>> if anything the A3 reasoning makes
sure such a program with allocas
>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>
>>>>>>>>>>> I'm also not sure I understand
what you try to argue for. Maybe
>>>>>>>>>>> elaborate a bit what it is you
think is bad or needs to be changed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The problem is that definition
of escapedness is not clear at the
>>>>>>>>>>> semantic
>>>>>>>>>>>> level. Describing the IR
semantics w.r.t. LLVM's escaped analysis
>>>>>>>>>>> function
>>>>>>>>>>>> isn't something we want.
>>>>>>>>>>>>
>>>>>>>>>>>> Semantic definition of
escapedness of a pointer seems hard, I mean
>>>>>>> in a
>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>> It isn't a single
instruction such as 'escape i8* ptr', and we need
>>>>>>> to
>>>>>>>>>>> look
>>>>>>>>>>>> over all instructions in the
function. For example, '(int)(p+1) -
>>>>>>>>> (int)p'
>>>>>>>>>>>> isn't semantically escaping
the pointer p because the result is 1
>>>>>>>>>>>> regardless of the value of p.
>>>>>>>>>>>> Stepwisely defining the
semantics of instructions is a desirable
>>>>>>>>>>> direction
>>>>>>>>>>>> IMO.
>>>>>>>>>>> I'm confused. What in the
proposal would prevent us from defining
>>>>>>>>>>> the semantics of instructions, or
force us to do it in an "undesirable
>>>>>>>>>>> way"?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> In practice, syntactically
checking escapedness + nice engineering
>>>>>>>>> might
>>>>>>>>>>>> not break optimizations in most
cases (as undef/poison did); but it
>>>>>>>>> would
>>>>>>>>>>>> be great to move to another
level, since LLVM IR is used in so many
>>>>>>>>>>> places
>>>>>>>>>>>> :)
>>>>>>>>>>> The property under which you can
coalesce objects is simple:
>>>>>>>>>>>       It is not observable.
>>>>>>>>>>>
>>>>>>>>>>> Now, if the address of one of the
two objects you coalesce is not
>>>>>>>>>>> observed, coalescing is not
observable. That is a sufficient condition
>>>>>>>>>>> not a necessary one. Pointer
"escaping" is one step further. If the
>>>>>>>>>>> address doesn't escape it is
not observed. This does not mean the
>>>>>>>>>>> "semantic conditions for
coalescing", e.g., for the purpose of
>>>>>>>>> translation
>>>>>>>>>>> validation, is supposed to be build
on top of our "definition of
>>>>>>>>> escaping
>>>>>>>>>>> pointers". That said, we use
"does not escape" as a precondition for
>>>>>>>>>>> various transformation and I'm
unsure what is any different now. The
>>>>>>>>>>> entire escape argument is only used
in the validity of the pointer
>>>>>>>>> folding.
>>>>>>>>>>> Similarly, we can fold a comparison
of a noalias pointer with another
>>>>>>>>> one
>>>>>>>>>>> if the former does not escape (and
both are dereferenced and one is
>>>>>>>>>>> written for sure).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The pointer comparison is
another beast. It indeed has a few issues,
>>>>>>>>> and
>>>>>>>>>>>> solving it might require
nontrivial solution.
>>>>>>>>>>> I think the main problem of the
inconsistencies and such we've seen is
>>>>>>>>>>> rooted in the erroneous folding of
pointer comparisons. Cleaning up
>>>>>>> the
>>>>>>>>>>> lifetime marker semantics is
actually unrelated and simply not folding
>>>>>>>>>>> as described in A3 should solve the
issue that has been reported.
>>>>>>>>>>> Nevertheless,
>>>>>>>>>>> we should take a crack at lifetime
markers while we are here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37 AM
Johannes Doerfert <
>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy new year :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> After we had a lengthy
discussion on phab last year, I've tried to
>>>>>>>>>>>>> summarize my thoughts,
>>>>>>>>>>>>> especially given that I had
some time to think about things over the
>>>>>>>>>>> break.
>>>>>>>>>>>>> Still, no promises on the
quality ;)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I start with general
questions I asked myself and then go on
>>>>>>> rambling
>>>>>>>>>>>>> about a potential design.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Q1: Is lifetime a given
property or a derived one, thus is it fixed
>>>>>>> or
>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a question I asked
myself a lot recently. I think it is
>>>>>>>>> derived
>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>> at least I hope it is. Only
that would allow transformations I would
>>>>>>>>>>> want
>>>>>>>>>>>>> us to do. Here are some
examples:
>>>>>>>>>>>>>       
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>       
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Q2: Is a pointer
comparison, or similar use, extending the lifetime?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Asked differently, can we
hoist a pointer comparison into a region
>>>>>>>>> where
>>>>>>>>>>>>> the pointer is dead?
>>>>>>>>>>>>> This is an important
question which we haven't discussed much as we
>>>>>>>>>>>>> assumed LICM has to work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current behavior is
that non-dereferencing uses are not
>>>>>>> extending
>>>>>>>>>>>>> the lifetime and are
>>>>>>>>>>>>> allowed outside of
"lifetime regions" (as indicated by markers).
>>>>>>> They
>>>>>>>>>>>>> will always produce valid
>>>>>>>>>>>>> results. Though, we might
want to think about a lifetime marker that
>>>>>>>>>>>>> spits out a new pointer
>>>>>>>>>>>>> value instead of reusing
the old one.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Q3: Can we use lifetime to
argue about addresses?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The question here is, can
we fold address comparisons based on
>>>>>>>>>>>>> lifetimes, or not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So far, we fold comparisons
based on "address information". For
>>>>>>>>> example,
>>>>>>>>>>>>> we "know"
globals,
>>>>>>>>>>>>> allocas, and mallocs cannot
be equal to one another. Also, two
>>>>>>>>> distinct
>>>>>>>>>>>>> allocations, for globals
>>>>>>>>>>>>> and allocas, are considered
unequal. Now, the crux is that we have
>>>>>>> to
>>>>>>>>> be
>>>>>>>>>>>>> consistent if we do two
>>>>>>>>>>>>> comparisons, and, as of
now, we are not (bug number missing). Since
>>>>>>>>> the
>>>>>>>>>>>>> backend (or any other place
>>>>>>>>>>>>> for that matter) might
coalesce allocas, their addresses might not
>>>>>>> be
>>>>>>>>>>>>> different after all. If we
>>>>>>>>>>>>> already folded a comparison
to "unequal" we are doomed if we later
>>>>>>>>> have
>>>>>>>>>>>>> a comparison that results
>>>>>>>>>>>>> in "equal".
(Note, this is different from aliasing rules as they can
>>>>>>>>> be
>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would hope we can come up
with a model that treats memory "the
>>>>>>>>> same",
>>>>>>>>>>>>> regardless if it is global,
>>>>>>>>>>>>> stack, or heap. I want to
avoid special casing one of them wrt.
>>>>>>>>> lifetime
>>>>>>>>>>>>> as I believe most
optimizations
>>>>>>>>>>>>> would apply to any of them,
potentially for different reasons and
>>>>>>> with
>>>>>>>>>>>>> different gains, but
nevertheless.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Proposal (largely based on
the conversation in phab):
>>>>>>>>>>>>>
>>>>>>>>>>>>> A1: Lifetime is a concept
that talks about memory content *only*.
>>>>>>>>>>>>> Basically, the memory
content is set to
>>>>>>>>>>>>>          undefined by
lifetime markers. It is derived/modifiable as
>>>>>>> it
>>>>>>>>> is
>>>>>>>>>>>>> just an "as-is"
property of the memory
>>>>>>>>>>>>>          content. The
lifetimes of an object, as described by
>>>>>>> markers,
>>>>>>>>>>> might
>>>>>>>>>>>>> change during the
compilation. They
>>>>>>>>>>>>>          might become
smaller if we deduce the object is not accessed
>>>>>>>>> and
>>>>>>>>>>>>> the memory content is not
used, they
>>>>>>>>>>>>>          might become
larger if objects with non-overlapping
>>>>>>> lifetimes
>>>>>>>>> are
>>>>>>>>>>>>> coalesced. (One could see
the latter as
>>>>>>>>>>>>>          introducing a new
object though.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> A2: If we define lifetime
as above, it has nothing to do with the
>>>>>>>>>>>>> address of an object.
Consequently, pointer
>>>>>>>>>>>>>          comparisons and
similar operations are valid outside the
>>>>>>>>> lifetime.
>>>>>>>>>>>>> Loads and stores are as
well, they can
>>>>>>>>>>>>>          even not be
removed "easily". A store followed by a lifetime
>>>>>>>>>>> marker
>>>>>>>>>>>>> or a load following a
lifetime marker
>>>>>>>>>>>>>          is dead or results
in undef respectively.
>>>>>>>>>>>>>
>>>>>>>>>>>>> A3: We could not use
lifetime to argue about addresses. This means
>>>>>>> we
>>>>>>>>>>>>> could/should also not argue
that overlapping
>>>>>>>>>>>>>          lifetimes result
in different addresses. Thus, a comparison
>>>>>>>>>>> between
>>>>>>>>>>>>> the address of two allocas
could not
>>>>>>>>>>>>>          immediately be
folded. That said, there would be special
>>>>>>> cases
>>>>>>>>>>>>> though. Basically, if one
of the allocas does
>>>>>>>>>>>>>          not escape, other
than the comparisons to be folded, we can
>>>>>>>>> fold
>>>>>>>>>>>>> them. Afterwards,
coalescing or splitting
>>>>>>>>>>>>>          would still be
consistent because it is unobservable. The
>>>>>>>>> problem
>>>>>>>>>>>>> we have in-tree is that we
fold even though
>>>>>>>>>>>>>          the address is
still observed (after the fold). It is still
>>>>>>>>>>> unclear
>>>>>>>>>>>>> to me what the impact of
this would be
>>>>>>>>>>>>>          on real code. I
suggested before that we run some
>>>>>>> experiments
>>>>>>>>>>> first
>>>>>>>>>>>>> before we make any decision
whatsoever.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is pretty much saying
that lifetime markers are
>>>>>>> `memset(undef)`,
>>>>>>>>> as
>>>>>>>>>>>>> you suggested before (I
think).
>>>>>>>>>>>>> There are some
implementation level differences but at the end of
>>>>>>> the
>>>>>>>>>>>>> day they are basically the
same.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy to hear some thoughts
on this, especially if it fixes the
>>>>>>>>> problems
>>>>>>>>>>>>> that lead to D93376 in the
first place.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 12/18/20 2:42 AM,
Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We're discussing
the well-formedness of lifetime.start/end
>>>>>>> intrinsic
>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>> (syntactically
>>>>>>>>> &
>>>>>>>>>>>>>> semantically) valid use
of these intrinsics.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to gather
more context about the intrinsics.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Is there a frontend
or framework that introduces lifetime call
>>>>>>>>> with
>>>>>>>>>>>>>> non-stack allocated
objects?
>>>>>>>>>>>>>> If exists, which
behavior do they expect from it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Can a block's
lifetime be started bytewise or elementwise?
>>>>>>>>>>>>>> I imagine an
optimization that allow using stack very compactly,
>>>>>>> but
>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>> there is a real-world
use case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> LLVM Developers mailing
list
>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>
>>
-- 
Website: https://people.mpi-sws.org/~jung/

Johannes Doerfert via llvm-dev

2021-Jan-18 21:55 UTC

head link

[llvm-dev] lifetime.start/end

On 1/13/21 5:14 AM, Ralf Jung wrote:> Hi Johannes,
>
>>>> The tricky part is, do we see `%c = false` at compile time
>>>> or at runtime. If it's the former, we might run into the
same
>>>> coalescing issue we have for allocas:
>>>>
>>>> p = malloc(1)
>>>> free(q)
>>>> q = malloc(1)
>>>> free(p)
>>>> %c = icmp eq %p, %q
>>>>
>>>> What is `%c` now? It could go either way, depending on your 
>>>> allocator, agreed? 
>>>
>>> Agreed.
>>>
>>>> If so, we need to ensure
>>>> there is not a second `%c2 = icmp eq %p, %q` somewhere if we
want
>>>> to fold `%c` as it needs to be consistent.
>>>
>>> Indeed -- if we constrain the non-deterministic choice that the 
>>> allocator is making, we need to do so consistently.
>>>
>>> On Phabricator, you wrote
>>>
>>>> We could instead coalesce allocas in the middle end (more 
>>>> aggressively and not only based on lifetime markers) in order
to
>>>> fold comparisons of the "then equal" allocas to
"equal".
>>>
>>> So, to me this sounded like you wanted to fold `icmp` to `true` in 
>>> situations like in my example. It looks like I misunderstood.
>>> So is your proposal instead that we can coalesce allocations
*after*
>>> we have ensured that there definitely is no `icmp` on their pointer
>>> any more? Like, for local variables, we first fold away all `icmp` 
>>> and then we can freely coalesce since now it's not observable
that
>>> we are reusing memory?
>>
>> Let me start with:
>>
>> My phab response is "stale" by now. My first email to this
thread was
>> written after I had more time to think about all this:
>> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
>>
>> Separation of concerns, yes. If you can't observe we coalesced, we 
>> can coalesce, or choose not to. My main argument is that the folding 
>> we do
>> right now is broken, or the coalescing is. We cannot fold, coalesce, 
>> and then fold again (which could be runtime eval), as that leads to
>> inconsistencies.
>
> I think this is equivalent to what I said, namely that we can only 
> coalesce when there are no comparison checks left -- it's not just 
> that we cannot fold icmp any more after colaescing, we cannot have any 
> icmp, folded or not, since runtime-checking of ptr equality will yield 
> the wrong result after coalescing.
> (I have a hard time conceptualizing runtime eval as a kind of
"fold"
> the way you do.^^ So I prefer to think of this in terms of the 
> conditions that have to hold when coalescing happens: no icmp left on 
> the involved pointers.)
No icmp left is not the right condition but I guess it's a shorthand for 
"address is not observed", then we are in agreement. The runtime eval
is little different from static partial evaluation, except that we have 
all the values, but maybe that is just me.

>
>>> However, if we translate alloca to malloc+free, all (almost all?)
of
>>> these allocations *will* overlap in their lifetime, so
'false'
>>> (inequal) is the only thing we could ever fold anything to (and we 
>>> don't have to worry about consistency here since this is the
only
>>> possible choice). I guess after we did that, if there's nothing
else
>>> observing these addresses, we could indeed actually reuse the 
>>> underlying storage...
>>>
>>> Am I on the right track?
>>
>> I guess initially we would only fold to `inequal`, which is fine, 
>> until we start to coalesce locations with non-overlapping lifetime 
>> early.
>> Though, now that I said this, I'm not even sure we could ever fold 
>> after coalescing, thus we could ever coalesce if the addresses (of
both)
>> are "still observed". Maybe folding is not the problem but
coalescing
>> is, though I think both are in the most general sense. I think I can
>> fabricate a reasonable flow of transformation in which early *and* 
>> partial folding is bad:
>>
>> a = alloc
>> b = alloc
>> ... // no use of a or b
>> foo(a)
>> ... // no use of a or b
>> bar(b)
>> ... // no use of a or b
>> c = (a == b)
>>
>> now we assume that the two calls are outlined after we fold `(a == 
>> b)` to `false`:
>>
>> void foocall() {
>> a = alloc
>> foo(a)
>> }
>> void barcall() {
>> a = alloc
>> bar(a)
>> }
>>
>> foocall()
>> barcall()
>> c = true
>>
>> now we assume foo stores away `a` and bar compares it to `b`. That 
>> would be a problem.
>>
>> WDYT?
>
> Hm, yeah, this shows that even outlining is a problem since it can 
> make previously overlapping allocations non-overlapping. In a sense, 
> there is a step happening before outlining here, which is to move 
> around the alloca and explicit free to make the code more like
>
> ... // no use of a or b
> a = alloc
> foo(a)
> free a
> ... // no use of a or b
> b = alloc
> bar(b)
> free b
> ... // no use of a or b
> c = (a == b)
>
> It is this reordering of (de)allocation that is incorrect; the 
> outlining itself is not the problem.
>
> This also reflects my general concern with your approach: there's a 
> lot of information loss. When I write C (or Rust) code, there is no 
> guarantee that different local variables have different addresses. And 
> yet when I translate that code to LLVM by doing
>
> a = alloca
> b = alloca
> [...]
>
> now I have generated LLVM IR where a and b will definitely get 
> different addresses (if alloca works anything like a regular 
> allocation operation). This is a correct translation from C to LLVM 
> IR, but it heavily restricts the possible allocation choices compared 
> to the original C program. So at this point frontends would be 
> well-advised to not generate code like this, but instead carefully 
> place the alloca as late as possible. Frontends will also want to have 
> an explicit "free" for stack-slots to indicate when they are not 
> needed any more.
> It is my understanding that this is exactly where the lifetime.start 
> and lifetime.end markers come in: they are *prescriptive* (in the 
> terminology of your email: they are a given property, and thus fixed 
> -- modulo the usual "as if" rule, of course); they are how the 
> frontend informs the later stages of the constraints on which alloca 
> may overlap and which may not. If the lifetime markers cannot play 
> this role, then something else should. There needs to be *some* way 
> for the frontend to encode this information.
At some point I lost track of what you even want to achieve. Your 
initial problem goes away if we only coalesce and fold comparisons 
between allocations for which the addresses are not observed.
Let's do that and thereby benefit from all the good things we can get 
from lifetime markers and coalescing, for alloca/malloc/...
Let's not go down the brittle road of syntactic limitations. It doesn't 
allow us to do a lot of things and will come with a price as we continue.

FWIW, I think some part of the misconception is that two allocas "will 
definitely get different addresses" which you argue "heavily restricts
the possible allocation choices compared to the original C program".
The addresses are different but that is only interesting if you observe 
them. Take the C code: `{int a,b, c = (&a == &b); use(a,c);}`. I can 
happily coalesce `a` and `b` after the comparison was folded, or `b` and 
`c` even if the comparison is still in place. For neither transformation 
I need
lifetime markers to do it in IR.

A lot of my arguments around alloca/malloc/... is that we can augment 
the information from the frontend by determining if an address is 
observed or not. If it is not, we can limit its allocation, coalesce it, 
etc. Lifetime markers can help us do this but they are not the only way.
So, basically, we do not need lifetime markers to coalesce 
alloca/malloc/... but that they can help.

Let's try to get out of this rabbit hole. If you have a problem with the 
proposal as outlined in 
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html , 
please let me know. I assume dealing with observed addresses different 
than with non-observede ones (implied by noescape) is the way forward now.

~ Johannes


>
> Kind regards,
> Ralf
>
>
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>>
>>>> Does that make some sense?
>>>>
>>>> ~ Johannes
>>>>
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert 
>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>>> It is, alloca must have its integral
address decided at its
>>>>>>>>> *allocation*
>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>
>>>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is
already allocated,
>>>>>>>>> the new
>>>>>>>>> alloca cannot be placed there unless it is
not observed in the
>>>>>>>>> future,
>>>>>>>>> according to your proposal.
>>>>>>>>>
>>>>>>>>> But how do we know from the current state
whether the stack
>>>>>>>>> variable is
>>>>>>>>> going to be observed or not?
>>>>>>>> With that argument you could allocate all but 4
bytes, do a
>>>>>>>> malloc(4)
>>>>>>>> and know the address of the returned pointer
(assuming it is
>>>>>>>> not null).
>>>>>>>>
>>>>>>>> What I try to say is, either your scenario is
part of the model
>>>>>>>> and
>>>>>>>> everything we do is broken as you could
"observe" addresses
>>>>>>>> passively,
>>>>>>>> *or*, the abstract machine we use for semantic
reasoning
>>>>>>>> doesn't permit
>>>>>>>> the above reasoning. I really hope for the
latter.
>>>>>>>>
>>>>>>> That's a very valid point..!
>>>>>>>
>>>>>>> Actually, I have a memory model that addresses this
problem.
>>>>>>> The gist is that we can interpret each allocation
instruction as
>>>>>>> *creating
>>>>>>> 2 blocks* and nondeterministically returning one of
them.
>>>>>>> The other one is marked as invalid but not freed
immediately.
>>>>>>> Deallocation
>>>>>>> frees both blocks.
>>>>>>> If there is not enough space for having two blocks,
it is
>>>>>>> treated as
>>>>>>> out-of-memory.
>>>>>>>
>>>>>>> It makes guessing the address of an allocation
according to
>>>>>>> memory layout
>>>>>>> invalid UB.
>>>>>>>
>>>>>>> p = malloc(4)
>>>>>>> // If p != null, it is guaranteed there was at
least two 4 byte
>>>>>>> slots
>>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>>> // Two blocks are allocated at 0x110 and 0x114, and
one of the
>>>>>>> addresses is
>>>>>>> nondeterministically returned.
>>>>>>> // By the nature of nondeterminism, all executions
below should be
>>>>>>> well-defined regardless of the address of p.
>>>>>>> *(int*)0x100 = 10; // In an execution where p is
0x110, this
>>>>>>> raises UB.
>>>>>>>
>>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495>
>>>>>>> FYI.
>>>>>>
>>>>>> Nice! Thanks for the link :)
>>>>>>
>>>>>>
>>>>>>> To argue differently: Who is to say there is a
stack, or only one,
>>>>>>>> or that alloca allocates memory "on the
one stack"? That is not
>>>>>>>> part of
>>>>>>>> the IR, IMHO. I can write a machine on which
alloca lowers to
>>>>>>>> malloc,
>>>>>>>> I don't even need the free during stack
unwind but that I could
>>>>>>>> do as
>>>>>>>> well if I wanted to.
>>>>>>>
>>>>>>> This is right, the example was for illustrative
purposes. IR
>>>>>>> does not
>>>>>>> enforce alloca to be placed at 'stack'.
>>>>>>>
>>>>>>>
>>>>>>>>> Making icmp/ptrtoint yield poison will
still make loop
>>>>>>>>> versioning or
>>>>>>>>> pointer rewriting transformations unsound
because these
>>>>>>>>> operations now
>>>>>>>> can
>>>>>>>>> create poison (even if pointers are
noundef).
>>>>>>>> I did not say they yield poison, at least I did
not try to say
>>>>>>>> that.
>>>>>>>> What are you referring to exactly?
>>>>>>>>
>>>>>>> I was referring to this:
>>>>>>>
>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions, which
>>>>>>>>> contradicts with what LLVM does.
>>>>>>>> As with other violation of attributes I would,
on first though,
>>>>>>>> suggest
>>>>>>>> to produce poison, not UB.
>>>>>>> But it is more about the (imaginary) attribute, so
maybe I was
>>>>>>> slightly out
>>>>>>> of topic.
>>>>>>>
>>>>>>>> 2) is fine, I think the suggestion semantically
makes sense
>>>>>>>> perfectly. 1)
>>>>>>>>> is something I'm concerned about now.
>>>>>>>>>
>>>>>>>>> There are more than pointer foldings, such
as rewriting such
>>>>>>>>> expression,
>>>>>>>>> code motion ptr cmp, introduce ptr cmp,
etc. There is also
>>>>>>>>> analysis
>>>>>>>> relying
>>>>>>>>> on ptr cmp.
>>>>>>>>> Defining the correctness of each of them is
something we want
>>>>>>>>> to avoid,
>>>>>>>> and
>>>>>>>>> maybe that's why we want to define
precise semantics for things.
>>>>>>>> I don't get the point. My proposal does not
change the
>>>>>>>> semantics of
>>>>>>>> pointer comparisons, at all. I explicitly
mentioned that in the
>>>>>>>> last
>>>>>>>> email.
>>>>>>>>
>>>>>>> Oh okay, I thought it was a part of the lifetime
proposal, but
>>>>>>> it seems
>>>>>>> more like a separate thing.
>>>>>>> I agree that this requires performance impact.
>>>>>>> Also investigation of existing transformations
would be needed;
>>>>>>> Alive2's
>>>>>>> pointer comparison is doing approximation yet, but
if it becomes
>>>>>>> fully
>>>>>>> precise, it will show something from running LLVM
unit tests I
>>>>>>> believe..! :)
>>>>>>>
>>>>>>>> I think this will be less aggressive and may
give nice feedback to
>>>>>>>>> potential projects that are using lifetime
with non-alloca.
>>>>>>>> The lifetime marker debate, basically 2) above,
is orthogonal
>>>>>>>> to the
>>>>>>>> problem
>>>>>>>> you try to solve. It got mixed in as lifetime
markers were used by
>>>>>>>> StackColoring
>>>>>>>> to perform coalescing but that is coincidental.
You can (arguably)
>>>>>>>> coalesce stack
>>>>>>>> allocations regardless of lifetime markers and
with 1) such a
>>>>>>>> transformation
>>>>>>>> (w/ and w/o lifetime markers) would actually be
sound.
>>>>>>>>
>>>>>>>>
>>>>>>>>> At the end, we can implement IR writer that
lowers lifetime with
>>>>>>>> non-alloca
>>>>>>>>> into memset(undef). WDYT?
>>>>>>>> Yeah, 2) is orthogonal and we can lower it that
way. Unsure if
>>>>>>>> it is
>>>>>>>> helpful
>>>>>>>> but we can certainly define it that way in the
LangRef.
>>>>>>>>
>>>>>>> Okay, thanks!
>>>>>>>
>>>>>>>
>>>>>>>> ~ Johannes
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Juneyoung
>>>>>>>>>
>>>>>>>>> p.s. The reply was late, sorry. I think I
can spend more time
>>>>>>>>> on this
>>>>>>>> today.
>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes
Doerfert <
>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>> Stepwisely defining the
semantics of instructions is a
>>>>>>>>>>>>> desirable
>>>>>>>>>>> direction
>>>>>>>>>>>>> IMO.
>>>>>>>>>>>> I'm confused. What in the
proposal would prevent us from
>>>>>>>>>>>> defining
>>>>>>>>>>>> the semantics of instructions,
or force us to do it in an
>>>>>>>>>>>> "undesirable
>>>>>>>>>>> way"?
>>>>>>>>>>>
>>>>>>>>>>> I meant it would be great if the
output state after
>>>>>>>>>>> executing an
>>>>>>>>>>> instruction can be described using
its input state.
>>>>>>>>>>> (that was the meaning of
'stepwise semantics', I should have
>>>>>>>>>>> been more
>>>>>>>>>>> clear about this)
>>>>>>>>>>>
>>>>>>>>>>> For example, the semantics of
'z = add x y' can be defined
>>>>>>>>>>> as follows:
>>>>>>>>>>> Given an input state s, next state
s' = s[z -> s(x) + s(y)]
>>>>>>>>>>> where s(x) is the value of x in the
previous state, and s[z
>>>>>>>>>>> -> v] is a
>>>>>>>>>>> state with z updated to v.
>>>>>>>>>>>
>>>>>>>>>>> Another example that involves
memory: the semantics of 'i =
>>>>>>>>>>> ptrtoint p'
>>>>>>>>>> can
>>>>>>>>>>> be defined as follows:
>>>>>>>>>>> Given an input state s, next state
s' = s[i ->
>>>>>>>>>>> s(p).obj.address +
>>>>>>>>>>> s(p).offset]
>>>>>>>>>>> where obj.address is the begin
address of a memory object
>>>>>>>>>>> obj pointed
>>>>>>>> by
>>>>>>>>>> p
>>>>>>>>>>> & offset is p's byte
offset. (Imagine a pointer to the
>>>>>>>>>>> offset of some
>>>>>>>>>> char
>>>>>>>>>>> array).
>>>>>>>>>>> Note that ptrtoint & add can be
nicely defined w.r.t the
>>>>>>>>>>> input state.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now, the instruction that we're
facing is 'p = alloca'.
>>>>>>>>>>> To describe the output state after
executing 'p = alloca',
>>>>>>>>>>> the address
>>>>>>>> of
>>>>>>>>>>> new alloca should be determined.
>>>>>>>>>>> If observedness is involved, we
need to know the future
>>>>>>>>>>> state again. :/
>>>>>>>>>> We
>>>>>>>>>>> don't know whether the alloca
is going to be observed or not
>>>>>>>>>>> without
>>>>>>>>>> seeing
>>>>>>>>>>> the future.
>>>>>>>>>>> This is the problem of the current
lifetime intrinsics as well.
>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>
>>>>>>>>>> Nobody proposed to modify the semantics
of `alloca`.
>>>>>>>>>> `alloca` provides you with a fresh,
unobserved block of
>>>>>>>>>> dereferenceable memory that is
implicitly freed as the stack
>>>>>>>>>> unwinds. That is it. No context
necessary.
>>>>>>>>>>
>>>>>>>>>> If you want to modify the IR, you need
to argue the observable
>>>>>>>>>> semantics which is nothing new. That
this might require more
>>>>>>>>>> than
>>>>>>>>>> a peephole view of the program is also
not new.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> One possible approach to resolve
this is adding an
>>>>>>>>>>> 'unobserved' flag to
>>>>>>>>>>> alloca instruction (similar to what
was suggested by Nicolai).
>>>>>>>>>>> And, we can say that if alloca with
'unobserved' is used by
>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>> it is UB.
>>>>>>>>>> The flag can be added, like we add
other attributes. It
>>>>>>>>>> should not
>>>>>>>>>> be required for any optimization we
talked about though. It
>>>>>>>>>> basically
>>>>>>>>>> is a way to manifest derived or given
information into the IR.
>>>>>>>>>>
>>>>>>>>>> Attribute deduction, as well as
frontends with domain knowledge,
>>>>>>>>>> can add such information. The flag we
discussed in phab was
>>>>>>>>>> not even
>>>>>>>>>> sufficient for all the transformation
examples I presented in
>>>>>>>>>> my mail,
>>>>>>>>>> that is why I extended my  argument. We
could still have a
>>>>>>>>>> "noescape"
>>>>>>>>>> flag for allocas, but I'm not sure
how useful that really is.
>>>>>>>>>> We can
>>>>>>>>>> certainly deduce it and manifest it,
unsure if we have domain
>>>>>>>>>> knowledge
>>>>>>>>>> we can use for non-trivial cases
though.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions,
>>>>>>>>>>> which
>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>> As with other violation of attributes I
would, on first
>>>>>>>>>> though, suggest
>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Also, existing optimizations like
loop versioning can introduce
>>>>>>>>>>> ptrtoint/pointer comparisons too.
>>>>>>>>>> Sure. I am not certain why that is a
problem. I get the
>>>>>>>>>> feeling things
>>>>>>>>>> are still mixed up here.
>>>>>>>>>>
>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>
>>>>>>>>>> 1) We stop folding comparisons between
different allocas if
>>>>>>>>>> changing the
>>>>>>>>>> address
>>>>>>>>>>       of both might be observable.
Thus, if both might have
>>>>>>>>>> their address
>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>       other than the comparisons we
want to fold, we cannot
>>>>>>>>>> proceed.
>>>>>>>>>>
>>>>>>>>>> 2) We define lifetime markers to mean
`memset(undef)`.
>>>>>>>>>>
>>>>>>>>>> The first should be sufficient for the
problem you were
>>>>>>>>>> trying to solve
>>>>>>>>>> in the
>>>>>>>>>> first place. The second makes lifetime
markers less weird.
>>>>>>>>>> Note that 1)
>>>>>>>>>> is not changing
>>>>>>>>>> the semantics of the IR. We basically
just argue there is a
>>>>>>>>>> bug in our
>>>>>>>>>> instcombine right
>>>>>>>>>> now as we do not check all necessary
preconditions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I see that there are other
questions that I didn't answer
>>>>>>>>>>> yet, but let
>>>>>>>> me
>>>>>>>>>>> answer this first to limit the
length of the text :)
>>>>>>>>>> Sure, we can split the discussion :)
>>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Juneyoung
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM
Johannes Doerfert <
>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung
Lee wrote:
>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I read your proposal and
thought about the model.
>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> As you concerned in A3,
certain programs may be valid only
>>>>>>>>>>>>> when
>>>>>>>> memory
>>>>>>>>>>>>> blocks with overlapping
lifetimes have disjoint addresses.
>>>>>>>>>>>>> Look at this example
(I'm using malloc, but alloca also
>>>>>>>>>>>>> works):
>>>>>>>>>>>>>
>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>> p2 = malloc(4) // for
brevity, assume that there as enough
>>>>>>>>>>>>> space & p1
>>>>>>>>>> and
>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>>> s.insert(p1); s.insert(p2);
// If the second insert did
>>>>>>>>>>>>> nothing, it
>>>>>>>>>> would
>>>>>>>>>>>>> be surprise to programmers
>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Clearly, IR semantics
should guarantee that escaped blocks
>>>>>>>>>>>>> are
>>>>>>>>>> disjoint.
>>>>>>>>>>>> It
>>>>>>>>>>>>> would be great for
verification tools on LLVM IR to be
>>>>>>>>>>>>> able to answer
>>>>>>>>>>>> that
>>>>>>>>>>>>> the second insert will
succeed.
>>>>>>>>>>>> I agree, the second insert
should succeed, assuming `p1 &&
>>>>>>>>>>>> p2`.
>>>>>>>>>>>> I don't think my proposal
would in any way impact the
>>>>>>>>>>>> program above,
>>>>>>>>>>>> if anything the A3 reasoning
makes sure such a program with
>>>>>>>>>>>> allocas
>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm also not sure I
understand what you try to argue for.
>>>>>>>>>>>> Maybe
>>>>>>>>>>>> elaborate a bit what it is you
think is bad or needs to be
>>>>>>>>>>>> changed.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The problem is that
definition of escapedness is not clear
>>>>>>>>>>>>> at the
>>>>>>>>>>>> semantic
>>>>>>>>>>>>> level. Describing the IR
semantics w.r.t. LLVM's escaped
>>>>>>>>>>>>> analysis
>>>>>>>>>>>> function
>>>>>>>>>>>>> isn't something we
want.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Semantic definition of
escapedness of a pointer seems
>>>>>>>>>>>>> hard, I mean
>>>>>>>> in a
>>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>>> It isn't a single
instruction such as 'escape i8* ptr',
>>>>>>>>>>>>> and we need
>>>>>>>> to
>>>>>>>>>>>> look
>>>>>>>>>>>>> over all instructions in
the function. For example,
>>>>>>>>>>>>> '(int)(p+1) -
>>>>>>>>>> (int)p'
>>>>>>>>>>>>> isn't semantically
escaping the pointer p because the
>>>>>>>>>>>>> result is 1
>>>>>>>>>>>>> regardless of the value of
p.
>>>>>>>>>>>>> Stepwisely defining the
semantics of instructions is a
>>>>>>>>>>>>> desirable
>>>>>>>>>>>> direction
>>>>>>>>>>>>> IMO.
>>>>>>>>>>>> I'm confused. What in the
proposal would prevent us from
>>>>>>>>>>>> defining
>>>>>>>>>>>> the semantics of instructions,
or force us to do it in an
>>>>>>>>>>>> "undesirable
>>>>>>>>>>>> way"?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> In practice, syntactically
checking escapedness + nice
>>>>>>>>>>>>> engineering
>>>>>>>>>> might
>>>>>>>>>>>>> not break optimizations in
most cases (as undef/poison
>>>>>>>>>>>>> did); but it
>>>>>>>>>> would
>>>>>>>>>>>>> be great to move to another
level, since LLVM IR is used
>>>>>>>>>>>>> in so many
>>>>>>>>>>>> places
>>>>>>>>>>>>> :)
>>>>>>>>>>>> The property under which you
can coalesce objects is simple:
>>>>>>>>>>>>       It is not observable.
>>>>>>>>>>>>
>>>>>>>>>>>> Now, if the address of one of
the two objects you coalesce
>>>>>>>>>>>> is not
>>>>>>>>>>>> observed, coalescing is not
observable. That is a
>>>>>>>>>>>> sufficient condition
>>>>>>>>>>>> not a necessary one. Pointer
"escaping" is one step
>>>>>>>>>>>> further. If the
>>>>>>>>>>>> address doesn't escape it
is not observed. This does not
>>>>>>>>>>>> mean the
>>>>>>>>>>>> "semantic conditions for
coalescing", e.g., for the purpose of
>>>>>>>>>> translation
>>>>>>>>>>>> validation, is supposed to be
build on top of our
>>>>>>>>>>>> "definition of
>>>>>>>>>> escaping
>>>>>>>>>>>> pointers". That said, we
use "does not escape" as a
>>>>>>>>>>>> precondition for
>>>>>>>>>>>> various transformation and
I'm unsure what is any different
>>>>>>>>>>>> now. The
>>>>>>>>>>>> entire escape argument is only
used in the validity of the
>>>>>>>>>>>> pointer
>>>>>>>>>> folding.
>>>>>>>>>>>> Similarly, we can fold a
comparison of a noalias pointer
>>>>>>>>>>>> with another
>>>>>>>>>> one
>>>>>>>>>>>> if the former does not escape
(and both are dereferenced
>>>>>>>>>>>> and one is
>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The pointer comparison is
another beast. It indeed has a
>>>>>>>>>>>>> few issues,
>>>>>>>>>> and
>>>>>>>>>>>>> solving it might require
nontrivial solution.
>>>>>>>>>>>> I think the main problem of the
inconsistencies and such
>>>>>>>>>>>> we've seen is
>>>>>>>>>>>> rooted in the erroneous folding
of pointer comparisons.
>>>>>>>>>>>> Cleaning up
>>>>>>>> the
>>>>>>>>>>>> lifetime marker semantics is
actually unrelated and simply
>>>>>>>>>>>> not folding
>>>>>>>>>>>> as described in A3 should solve
the issue that has been
>>>>>>>>>>>> reported.
>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>> we should take a crack at
lifetime markers while we are here.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37
AM Johannes Doerfert <
>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Happy new year :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After we had a lengthy
discussion on phab last year, I've
>>>>>>>>>>>>>> tried to
>>>>>>>>>>>>>> summarize my thoughts,
>>>>>>>>>>>>>> especially given that I
had some time to think about
>>>>>>>>>>>>>> things over the
>>>>>>>>>>>> break.
>>>>>>>>>>>>>> Still, no promises on
the quality ;)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I start with general
questions I asked myself and then go on
>>>>>>>> rambling
>>>>>>>>>>>>>> about a potential
design.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Q1: Is lifetime a given
property or a derived one, thus
>>>>>>>>>>>>>> is it fixed
>>>>>>>> or
>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a question I
asked myself a lot recently. I think
>>>>>>>>>>>>>> it is
>>>>>>>>>> derived
>>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>>> at least I hope it is.
Only that would allow
>>>>>>>>>>>>>> transformations I would
>>>>>>>>>>>> want
>>>>>>>>>>>>>> us to do. Here are some
examples:
>>>>>>>>>>>>>>       
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>       
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Q2: Is a pointer
comparison, or similar use, extending
>>>>>>>>>>>>>> the lifetime?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Asked differently, can
we hoist a pointer comparison into
>>>>>>>>>>>>>> a region
>>>>>>>>>> where
>>>>>>>>>>>>>> the pointer is dead?
>>>>>>>>>>>>>> This is an important
question which we haven't discussed
>>>>>>>>>>>>>> much as we
>>>>>>>>>>>>>> assumed LICM has to
work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The current behavior is
that non-dereferencing uses are not
>>>>>>>> extending
>>>>>>>>>>>>>> the lifetime and are
>>>>>>>>>>>>>> allowed outside of
"lifetime regions" (as indicated by
>>>>>>>>>>>>>> markers).
>>>>>>>> They
>>>>>>>>>>>>>> will always produce
valid
>>>>>>>>>>>>>> results. Though, we
might want to think about a lifetime
>>>>>>>>>>>>>> marker that
>>>>>>>>>>>>>> spits out a new pointer
>>>>>>>>>>>>>> value instead of
reusing the old one.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Q3: Can we use lifetime
to argue about addresses?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The question here is,
can we fold address comparisons
>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>> lifetimes, or not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So far, we fold
comparisons based on "address
>>>>>>>>>>>>>> information". For
>>>>>>>>>> example,
>>>>>>>>>>>>>> we "know"
globals,
>>>>>>>>>>>>>> allocas, and mallocs
cannot be equal to one another.
>>>>>>>>>>>>>> Also, two
>>>>>>>>>> distinct
>>>>>>>>>>>>>> allocations, for
globals
>>>>>>>>>>>>>> and allocas, are
considered unequal. Now, the crux is
>>>>>>>>>>>>>> that we have
>>>>>>>> to
>>>>>>>>>> be
>>>>>>>>>>>>>> consistent if we do two
>>>>>>>>>>>>>> comparisons, and, as of
now, we are not (bug number
>>>>>>>>>>>>>> missing). Since
>>>>>>>>>> the
>>>>>>>>>>>>>> backend (or any other
place
>>>>>>>>>>>>>> for that matter) might
coalesce allocas, their addresses
>>>>>>>>>>>>>> might not
>>>>>>>> be
>>>>>>>>>>>>>> different after all. If
we
>>>>>>>>>>>>>> already folded a
comparison to "unequal" we are doomed if
>>>>>>>>>>>>>> we later
>>>>>>>>>> have
>>>>>>>>>>>>>> a comparison that
results
>>>>>>>>>>>>>> in "equal".
(Note, this is different from aliasing rules
>>>>>>>>>>>>>> as they can
>>>>>>>>>> be
>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would hope we can
come up with a model that treats
>>>>>>>>>>>>>> memory "the
>>>>>>>>>> same",
>>>>>>>>>>>>>> regardless if it is
global,
>>>>>>>>>>>>>> stack, or heap. I want
to avoid special casing one of
>>>>>>>>>>>>>> them wrt.
>>>>>>>>>> lifetime
>>>>>>>>>>>>>> as I believe most
optimizations
>>>>>>>>>>>>>> would apply to any of
them, potentially for different
>>>>>>>>>>>>>> reasons and
>>>>>>>> with
>>>>>>>>>>>>>> different gains, but
nevertheless.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Proposal (largely based
on the conversation in phab):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A1: Lifetime is a
concept that talks about memory content
>>>>>>>>>>>>>> *only*.
>>>>>>>>>>>>>> Basically, the memory
content is set to
>>>>>>>>>>>>>>          undefined by
lifetime markers. It is
>>>>>>>>>>>>>> derived/modifiable as
>>>>>>>> it
>>>>>>>>>> is
>>>>>>>>>>>>>> just an
"as-is" property of the memory
>>>>>>>>>>>>>>          content. The
lifetimes of an object, as
>>>>>>>>>>>>>> described by
>>>>>>>> markers,
>>>>>>>>>>>> might
>>>>>>>>>>>>>> change during the
compilation. They
>>>>>>>>>>>>>>          might become
smaller if we deduce the object is
>>>>>>>>>>>>>> not accessed
>>>>>>>>>> and
>>>>>>>>>>>>>> the memory content is
not used, they
>>>>>>>>>>>>>>          might become
larger if objects with non-overlapping
>>>>>>>> lifetimes
>>>>>>>>>> are
>>>>>>>>>>>>>> coalesced. (One could
see the latter as
>>>>>>>>>>>>>>          introducing a
new object though.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A2: If we define
lifetime as above, it has nothing to do
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> address of an object.
Consequently, pointer
>>>>>>>>>>>>>>          comparisons
and similar operations are valid
>>>>>>>>>>>>>> outside the
>>>>>>>>>> lifetime.
>>>>>>>>>>>>>> Loads and stores are as
well, they can
>>>>>>>>>>>>>>          even not be
removed "easily". A store followed
>>>>>>>>>>>>>> by a lifetime
>>>>>>>>>>>> marker
>>>>>>>>>>>>>> or a load following a
lifetime marker
>>>>>>>>>>>>>>          is dead or
results in undef respectively.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A3: We could not use
lifetime to argue about addresses.
>>>>>>>>>>>>>> This means
>>>>>>>> we
>>>>>>>>>>>>>> could/should also not
argue that overlapping
>>>>>>>>>>>>>>          lifetimes
result in different addresses. Thus, a
>>>>>>>>>>>>>> comparison
>>>>>>>>>>>> between
>>>>>>>>>>>>>> the address of two
allocas could not
>>>>>>>>>>>>>>          immediately be
folded. That said, there would be
>>>>>>>>>>>>>> special
>>>>>>>> cases
>>>>>>>>>>>>>> though. Basically, if
one of the allocas does
>>>>>>>>>>>>>>          not escape,
other than the comparisons to be
>>>>>>>>>>>>>> folded, we can
>>>>>>>>>> fold
>>>>>>>>>>>>>> them. Afterwards,
coalescing or splitting
>>>>>>>>>>>>>>          would still be
consistent because it is
>>>>>>>>>>>>>> unobservable. The
>>>>>>>>>> problem
>>>>>>>>>>>>>> we have in-tree is that
we fold even though
>>>>>>>>>>>>>>          the address is
still observed (after the fold).
>>>>>>>>>>>>>> It is still
>>>>>>>>>>>> unclear
>>>>>>>>>>>>>> to me what the impact
of this would be
>>>>>>>>>>>>>>          on real code.
I suggested before that we run some
>>>>>>>> experiments
>>>>>>>>>>>> first
>>>>>>>>>>>>>> before we make any
decision whatsoever.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is pretty much
saying that lifetime markers are
>>>>>>>> `memset(undef)`,
>>>>>>>>>> as
>>>>>>>>>>>>>> you suggested before (I
think).
>>>>>>>>>>>>>> There are some
implementation level differences but at
>>>>>>>>>>>>>> the end of
>>>>>>>> the
>>>>>>>>>>>>>> day they are basically
the same.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Happy to hear some
thoughts on this, especially if it
>>>>>>>>>>>>>> fixes the
>>>>>>>>>> problems
>>>>>>>>>>>>>> that lead to D93376 in
the first place.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 12/18/20 2:42 AM,
Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We're
discussing the well-formedness of lifetime.start/end
>>>>>>>> intrinsic
>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>> (syntactically
>>>>>>>>>> &
>>>>>>>>>>>>>>> semantically) valid
use of these intrinsics.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to
gather more context about the intrinsics.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. Is there a
frontend or framework that introduces
>>>>>>>>>>>>>>> lifetime call
>>>>>>>>>> with
>>>>>>>>>>>>>>> non-stack allocated
objects?
>>>>>>>>>>>>>>> If exists, which
behavior do they expect from it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Can a
block's lifetime be started bytewise or
>>>>>>>>>>>>>>> elementwise?
>>>>>>>>>>>>>>> I imagine an
optimization that allow using stack very
>>>>>>>>>>>>>>> compactly,
>>>>>>>> but
>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>> there is a
real-world use case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> LLVM Developers
mailing list
>>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>
>>>
>

llvm dev - Jan 2021 - lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end