thr3ads.net - llvm dev - [llvm-dev] lifetime.start/end [Feb 2021]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2021-Jan-30 13:40 UTC

[llvm-dev] lifetime.start/end

Hi Johannes,
>>> I was opposing the proposal because of the brittle syntactic
restrictions. In
>>> my attempt to justify not having those I also tried to resolve the
above
>>> inconsistency in a more general manner because I thought that was
what it was
>>> all about. I believe that we either have to argue the address of a
dead
>>> object is unusable/undetermined, basically the direction of James
comment, or
>>> we need to decouple lifetime markers from locations.
>>
>> Folding to false and coalescing are inconsistent with each other, yes.
It is
>> just wrong to perform both of these on the above program, I would say
(if we
>> interpret it with LLVM semantics, which I assume has no "pointer
zapping"). Is
>> that up for debate?
>>
>> However, I do not see the immediate connection to lifetime markers --
the same
>> would be true for a program using malloc/free. Coalescing is
permissible
>> whenever lifetimes are disjoint. Folding to false is only permissible
if the
>> pointers are *definitely not equal*, e.g. if their lifetimes overlap.
>>
> I still believe folding can be done regardless of their lifetime, as long
as it
> is done consistent. Fold all pointer observations and the actual
> placement is not observable, we went down that road.
Sure, but that's the kind of exception that always applies when discussing 
optimizations.
But to make my statement more precise: you cannot fold comparison of *escaped* 
pointers to false unless the pointers are *definitely not equal*, e.g. if their 
lifetimes overlap.
> I agree with the malloc/free comparison, I did argue that the entire time,
or I
> tried.
> 
> If there is no/little connection to lifetime markers, I assume 
> https://reviews.llvm.org/D93376 did not fix the inconsistency then. It was,
> apparently, just an approach to clarify what a lifetime marker is, right? I
> might be more confused than before. I believe you should equate
> lifetime markers and memset(undef) and put the entire discussion about 
> folding/coalescing onto a foundation that is unrelated to lifetime
> markers.
My understanding is that lifetime markers are *explicitly intended to support 
coalescing* by permitting later phases to put allocas with disjoint lifetimes 
into the same stack slot. D93376 intended to keep it that way; fundamentally 
changing what lifetime markers are about by making them only about the content 
and not the address of the alloca seemed like a much more invasive change.
Am I misunderstanding something?
>> The syntactic condition also provides a clear path to evaluate whether
any
>> given folding or coalescing is correct. 
> 
> I disagree. The syntactic conditions, at least as I read them, allow the
above
> program with lifetime markers (as we have it now) but did not
> suggest it would be wrong to fold and later coalesce. Or maybe I missed why
the
> syntactic restrictions would prevent that, please explain then.
Where in that program would you place the lifetime markers? I assume at the 
beginning and end of the local variable's scope?
So that would become something like (also reflecting that allocas are all at the
top):

void foo() {
    intptr_t xp, yp;
    int x, y;
    { lifetime.start(&x); xp = &x; bar(&x); lifetime.end(&x); }
    { lifetime.start(&y); yp = &y; bar(&y); lifetime.end(&y); }
    print(xp == yp);               // can we fold this comparison to false?
}

I think evaluating correctness of folding to false here is quite clear (I assume
this is just using C syntax to represent LLVM IR, so we applyLLVM IR semantics):
- The program as-written is non-deterministic; x and y have disjoint lifetime so
they could have the same or a different address.
- When the compiler decides to fold the comparison to false, it means it has to 
commit to one of the possible non-det choices, and needs to stick to that 
choice. So *just* replacing "xp == yp" by "false" is
incorrect in the sense that
the new program has a possible behavior that did not occur with the old program:
"true false" can now be printed (*if* the allocator chooses to put x
and y into
the same location), whereas the original program would never print "true
false".
Note that we do not need to talk about coalescing here, we are just exploiting 
the fact that allocation is non-deterministic and that allocations with 
non-overlapping lifetime can be in the same spot or in different spots.

IOW, coalescing "x" and "y" in this program is correct, but
folding the
comparison to false is incorrect.

So when you disagree with my statement that we have "a clear path to
evaluate
whether any given folding or coalescing is correct", then which of the
above to
you disagree with?

Folding to false would be correct if it would be accompanied by also embedding 
into the program some hint which tells the allocator that the two allocations 
must not be at the same spot in memory. This would be a way to ensure that LLVM 
"sticks to its choice". But that issue is entirely orthogonal to the
semantics
of lifetime.start / lifetime.end; it arises with allocation lifetimes of any 
kind, including malloc/free.

I think you are proposing to make them only about the contents of the alloca, so
in the above "foo", now the lifetime of (the allocations backing)
"x" and "y"
*would* overlap, so folding to false (of escaped pointers) is... still not 
allowed, but now for less subtle reasons. But this doesn't *solve* the
"fold to
false" problem, it just means that the problem no longer applies to this 
particular program.

Or is your argument that, since "fold to false" is wrong in both
cases, it
doesn't really matter which choice we make?  However, in your proposal, the 
option of doing the fold while also "adding hints for the allocator"
no longer
exists.
> I don't think anyone wanted that. TBH, I think my proposal is by far
the most
> compatible one wrt. the "classic" model w/o lifetime markers,
given
> that I proposed not to entangle them in object allocation. Everything else
I
> proposed are "as-if" transformations. Do you disagree?
I agree that making the markers just erase the contents of the alloca is the 
simplest proposal on the table. *If* we truly can make LLVM follow that model, 
that would be very nice.
I would just be very surprised if it was possible to get this proposal past the 
people that will carefully watch over the existing optimization potential LLVM 
has, and make sure it does not become smaller. ;)  Maybe I am too pessimistic, 
but I would have never dared even making such a proposal as I would expect it to
be shot down. I'll be happy to be proven wrong about this. :)

Kind regards,
Ralf
> 
> ~ Johannes
> 
> 
> 
>>
>> Kind regards,
>> Ralf
>>
>>>
>>> ~ Johannes
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Ralf
>>>>
>>>>> A lot of my arguments around alloca/malloc/... is that we
can augment the
>>>>> information from the frontend by determining if an address
is observed or
>>>>> not. If it is not, we can limit its allocation, coalesce
it, etc. Lifetime
>>>>> markers can help us do this but they are not the only way.
>>>>> So, basically, we do not need lifetime markers to coalesce 
>>>>> alloca/malloc/... but that they can help.
>>>>
>>>>
>>>>
>>>>>
>>>>> Let's try to get out of this rabbit hole. If you have a
problem with the
>>>>> proposal as outlined in 
>>>>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html , please
>>>>> let me know. I assume dealing with observed addresses
different than with
>>>>> non-observede ones (implied by noescape) is the way forward
now.
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Kind regards,
>>>>>> Ralf
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Ralf
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does that make some sense?
>>>>>>>>>
>>>>>>>>> ~ Johannes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>> Ralf
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee
wrote:
>>>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM
Johannes Doerfert
>>>>>>>>>>>> <johannesdoerfert at
gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/7/21 4:55 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>> It is, alloca must have
its integral address decided at its
>>>>>>>>>>>>>> *allocation*
>>>>>>>>>>>>>> time as well, like
0xFF00... .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For example, if
0xFFFF0000 ~ 0xFF000000 is already allocated, the new
>>>>>>>>>>>>>> alloca cannot be placed
there unless it is not observed in the
>>>>>>>>>>>>>> future,
>>>>>>>>>>>>>> according to your
proposal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But how do we know from
the current state whether the stack
>>>>>>>>>>>>>> variable is
>>>>>>>>>>>>>> going to be observed or
not?
>>>>>>>>>>>>> With that argument you
could allocate all but 4 bytes, do a malloc(4)
>>>>>>>>>>>>> and know the address of the
returned pointer (assuming it is not
>>>>>>>>>>>>> null).
>>>>>>>>>>>>>
>>>>>>>>>>>>> What I try to say is,
either your scenario is part of the model and
>>>>>>>>>>>>> everything we do is broken
as you could "observe" addresses passively,
>>>>>>>>>>>>> *or*, the abstract machine
we use for semantic reasoning doesn't
>>>>>>>>>>>>> permit
>>>>>>>>>>>>> the above reasoning. I
really hope for the latter.
>>>>>>>>>>>>>
>>>>>>>>>>>> That's a very valid
point..!
>>>>>>>>>>>>
>>>>>>>>>>>> Actually, I have a memory model
that addresses this problem.
>>>>>>>>>>>> The gist is that we can
interpret each allocation instruction as
>>>>>>>>>>>> *creating
>>>>>>>>>>>> 2 blocks* and
nondeterministically returning one of them.
>>>>>>>>>>>> The other one is marked as
invalid but not freed immediately.
>>>>>>>>>>>> Deallocation
>>>>>>>>>>>> frees both blocks.
>>>>>>>>>>>> If there is not enough space
for having two blocks, it is treated as
>>>>>>>>>>>> out-of-memory.
>>>>>>>>>>>>
>>>>>>>>>>>> It makes guessing the address
of an allocation according to memory
>>>>>>>>>>>> layout
>>>>>>>>>>>> invalid UB.
>>>>>>>>>>>>
>>>>>>>>>>>> p = malloc(4)
>>>>>>>>>>>> // If p != null, it is
guaranteed there was at least two 4 byte slots
>>>>>>>>>>>> available, say 0x100~0x104 and
0x110~0x114.
>>>>>>>>>>>> // Two blocks are allocated at
0x110 and 0x114, and one of the
>>>>>>>>>>>> addresses is
>>>>>>>>>>>> nondeterministically returned.
>>>>>>>>>>>> // By the nature of
nondeterminism, all executions below should be
>>>>>>>>>>>> well-defined regardless of the
address of p.
>>>>>>>>>>>> *(int*)0x100 = 10; // In an
execution where p is 0x110, this raises UB.
>>>>>>>>>>>>
>>>>>>>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>>>>
>>>>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> To argue differently: Who is to
say there is a stack, or only one,
>>>>>>>>>>>>> or that alloca allocates
memory "on the one stack"? That is not
>>>>>>>>>>>>> part of
>>>>>>>>>>>>> the IR, IMHO. I can write a
machine on which alloca lowers to malloc,
>>>>>>>>>>>>> I don't even need the
free during stack unwind but that I could do as
>>>>>>>>>>>>> well if I wanted to.
>>>>>>>>>>>>
>>>>>>>>>>>> This is right, the example was
for illustrative purposes. IR does not
>>>>>>>>>>>> enforce alloca to be placed at
'stack'.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Making icmp/ptrtoint
yield poison will still make loop versioning or
>>>>>>>>>>>>>> pointer rewriting
transformations unsound because these operations
>>>>>>>>>>>>>> now
>>>>>>>>>>>>> can
>>>>>>>>>>>>>> create poison (even if
pointers are noundef).
>>>>>>>>>>>>> I did not say they yield
poison, at least I did not try to say that.
>>>>>>>>>>>>> What are you referring to
exactly?
>>>>>>>>>>>>>
>>>>>>>>>>>> I was referring to this:
>>>>>>>>>>>>
>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions, which
>>>>>>>>>>>>>> contradicts with what
LLVM does.
>>>>>>>>>>>>> As with other violation of
attributes I would, on first though,
>>>>>>>>>>>>> suggest
>>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>> But it is more about the
(imaginary) attribute, so maybe I was
>>>>>>>>>>>> slightly out
>>>>>>>>>>>> of topic.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2) is fine, I think the
suggestion semantically makes sense
>>>>>>>>>>>>> perfectly. 1)
>>>>>>>>>>>>>> is something I'm
concerned about now.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are more than
pointer foldings, such as rewriting such
>>>>>>>>>>>>>> expression,
>>>>>>>>>>>>>> code motion ptr cmp,
introduce ptr cmp, etc. There is also analysis
>>>>>>>>>>>>> relying
>>>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>>>> Defining the
correctness of each of them is something we want to
>>>>>>>>>>>>>> avoid,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> maybe that's why we
want to define precise semantics for things.
>>>>>>>>>>>>> I don't get the point.
My proposal does not change the semantics of
>>>>>>>>>>>>> pointer comparisons, at
all. I explicitly mentioned that in the last
>>>>>>>>>>>>> email.
>>>>>>>>>>>>>
>>>>>>>>>>>> Oh okay, I thought it was a
part of the lifetime proposal, but it seems
>>>>>>>>>>>> more like a separate thing.
>>>>>>>>>>>> I agree that this requires
performance impact.
>>>>>>>>>>>> Also investigation of existing
transformations would be needed;
>>>>>>>>>>>> Alive2's
>>>>>>>>>>>> pointer comparison is doing
approximation yet, but if it becomes fully
>>>>>>>>>>>> precise, it will show something
from running LLVM unit tests I
>>>>>>>>>>>> believe..! :)
>>>>>>>>>>>>
>>>>>>>>>>>>> I think this will be less
aggressive and may give nice feedback to
>>>>>>>>>>>>>> potential projects that
are using lifetime with non-alloca.
>>>>>>>>>>>>> The lifetime marker debate,
basically 2) above, is orthogonal to the
>>>>>>>>>>>>> problem
>>>>>>>>>>>>> you try to solve. It got
mixed in as lifetime markers were used by
>>>>>>>>>>>>> StackColoring
>>>>>>>>>>>>> to perform coalescing but
that is coincidental. You can (arguably)
>>>>>>>>>>>>> coalesce stack
>>>>>>>>>>>>> allocations regardless of
lifetime markers and with 1) such a
>>>>>>>>>>>>> transformation
>>>>>>>>>>>>> (w/ and w/o lifetime
markers) would actually be sound.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the end, we can
implement IR writer that lowers lifetime with
>>>>>>>>>>>>> non-alloca
>>>>>>>>>>>>>> into memset(undef).
WDYT?
>>>>>>>>>>>>> Yeah, 2) is orthogonal and
we can lower it that way. Unsure if it is
>>>>>>>>>>>>> helpful
>>>>>>>>>>>>> but we can certainly define
it that way in the LangRef.
>>>>>>>>>>>>>
>>>>>>>>>>>> Okay, thanks!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> p.s. The reply was
late, sorry. I think I can spend more time on this
>>>>>>>>>>>>> today.
>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at
9:02 AM Johannes Doerfert <
>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/6/21 4:33 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is a desirable
>>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us from defining
>>>>>>>>>>>>>>>>> the
semantics of instructions, or force us to do it in an
>>>>>>>>>>>>>>>>>
"undesirable
>>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I meant it
would be great if the output state after executing an
>>>>>>>>>>>>>>>> instruction can
be described using its input state.
>>>>>>>>>>>>>>>> (that was the
meaning of 'stepwise semantics', I should have
>>>>>>>>>>>>>>>> been more
>>>>>>>>>>>>>>>> clear about
this)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For example,
the semantics of 'z = add x y' can be defined as
>>>>>>>>>>>>>>>> follows:
>>>>>>>>>>>>>>>> Given an input
state s, next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>>>>> where s(x) is
the value of x in the previous state, and s[z ->
>>>>>>>>>>>>>>>> v] is a
>>>>>>>>>>>>>>>> state with z
updated to v.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Another example
that involves memory: the semantics of 'i =
>>>>>>>>>>>>>>>> ptrtoint p'
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> be defined as
follows:
>>>>>>>>>>>>>>>> Given an input
state s, next state s' = s[i -> s(p).obj.address +
>>>>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>>>>> where
obj.address is the begin address of a memory object obj
>>>>>>>>>>>>>>>> pointed
>>>>>>>>>>>>> by
>>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>>> & offset is
p's byte offset. (Imagine a pointer to the offset of
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> char
>>>>>>>>>>>>>>>> array).
>>>>>>>>>>>>>>>> Note that
ptrtoint & add can be nicely defined w.r.t the input
>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now, the
instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>>>>> To describe the
output state after executing 'p = alloca', the
>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> new alloca
should be determined.
>>>>>>>>>>>>>>>> If observedness
is involved, we need to know the future state
>>>>>>>>>>>>>>>> again. :/
>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>> don't know
whether the alloca is going to be observed or not
>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>>> This is the
problem of the current lifetime intrinsics as well.
>>>>>>>>>>>>>>> No, you mix things
up here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nobody proposed to
modify the semantics of `alloca`.
>>>>>>>>>>>>>>> `alloca` provides
you with a fresh, unobserved block of
>>>>>>>>>>>>>>> dereferenceable
memory that is implicitly freed as the stack
>>>>>>>>>>>>>>> unwinds. That is
it. No context necessary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you want to
modify the IR, you need to argue the observable
>>>>>>>>>>>>>>> semantics which is
nothing new. That this might require more than
>>>>>>>>>>>>>>> a peephole view of
the program is also not new.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One possible
approach to resolve this is adding an 'unobserved'
>>>>>>>>>>>>>>>> flag to
>>>>>>>>>>>>>>>> alloca
instruction (similar to what was suggested by Nicolai).
>>>>>>>>>>>>>>>> And, we can say
that if alloca with 'unobserved' is used by
>>>>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>>>>> The flag can be
added, like we add other attributes. It should not
>>>>>>>>>>>>>>> be required for any
optimization we talked about though. It
>>>>>>>>>>>>>>> basically
>>>>>>>>>>>>>>> is a way to
manifest derived or given information into the IR.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Attribute
deduction, as well as frontends with domain knowledge,
>>>>>>>>>>>>>>> can add such
information. The flag we discussed in phab was not even
>>>>>>>>>>>>>>> sufficient for all
the transformation examples I presented in my
>>>>>>>>>>>>>>> mail,
>>>>>>>>>>>>>>> that is why I
extended my  argument. We could still have a
>>>>>>>>>>>>>>>
"noescape"
>>>>>>>>>>>>>>> flag for allocas,
but I'm not sure how useful that really is. We can
>>>>>>>>>>>>>>> certainly deduce it
and manifest it, unsure if we have domain
>>>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>>>> we can use for
non-trivial cases though.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions, which
>>>>>>>>>>>>>>>> contradicts
with what LLVM does.
>>>>>>>>>>>>>>> As with other
violation of attributes I would, on first though,
>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>> to produce poison,
not UB.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, existing
optimizations like loop versioning can introduce
>>>>>>>>>>>>>>>>
ptrtoint/pointer comparisons too.
>>>>>>>>>>>>>>> Sure. I am not
certain why that is a problem. I get the feeling
>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>> are still mixed up
here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I proposed is
twofold:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) We stop folding
comparisons between different allocas if
>>>>>>>>>>>>>>> changing the
>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>       of both might
be observable. Thus, if both might have their
>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>
"taken"/escaped,
>>>>>>>>>>>>>>>       other than
the comparisons we want to fold, we cannot proceed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) We define
lifetime markers to mean `memset(undef)`.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The first should be
sufficient for the problem you were trying to
>>>>>>>>>>>>>>> solve
>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>> first place. The
second makes lifetime markers less weird. Note
>>>>>>>>>>>>>>> that 1)
>>>>>>>>>>>>>>> is not changing
>>>>>>>>>>>>>>> the semantics of
the IR. We basically just argue there is a bug
>>>>>>>>>>>>>>> in our
>>>>>>>>>>>>>>> instcombine right
>>>>>>>>>>>>>>> now as we do not
check all necessary preconditions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see that
there are other questions that I didn't answer yet,
>>>>>>>>>>>>>>>> but let
>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>> answer this
first to limit the length of the text :)
>>>>>>>>>>>>>>> Sure, we can split
the discussion :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 7,
2021 at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/5/21
8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>> Hi
Johannes,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I read
your proposal and thought about the model.
>>>>>>>>>>>>>>>>> Cool,
thanks!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As you
concerned in A3, certain programs may be valid only when
>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>> blocks
with overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>>>>> Look at
this example (I'm using malloc, but alloca also works):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> p1 =
malloc(4)
>>>>>>>>>>>>>>>>>> p2 =
malloc(4) // for brevity, assume that there as enough
>>>>>>>>>>>>>>>>>> space
& p1
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> p2 !=
null
>>>>>>>>>>>>>>>>>>
set<char*> s;
>>>>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert did
>>>>>>>>>>>>>>>>>>
nothing, it
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> be
surprise to programmers
>>>>>>>>>>>>>>>>>>
work(s);
>>>>>>>>>>>>>>>>>>
free(data1)
>>>>>>>>>>>>>>>>>>
free(data2)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
Clearly, IR semantics should guarantee that escaped blocks are
>>>>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> would
be great for verification tools on LLVM IR to be able to
>>>>>>>>>>>>>>>>>> answer
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> the
second insert will succeed.
>>>>>>>>>>>>>>>>> I agree,
the second insert should succeed, assuming `p1 && p2`.
>>>>>>>>>>>>>>>>> I don't
think my proposal would in any way impact the program
>>>>>>>>>>>>>>>>> above,
>>>>>>>>>>>>>>>>> if anything
the A3 reasoning makes sure such a program with
>>>>>>>>>>>>>>>>> allocas
>>>>>>>>>>>>>>>>> is not
miscompiled.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm
also not sure I understand what you try to argue for. Maybe
>>>>>>>>>>>>>>>>> elaborate a
bit what it is you think is bad or needs to be
>>>>>>>>>>>>>>>>> changed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
problem is that definition of escapedness is not clear at the
>>>>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>>>>> level.
Describing the IR semantics w.r.t. LLVM's escaped analysis
>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>
isn't something we want.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
Semantic definition of escapedness of a pointer seems hard, I
>>>>>>>>>>>>>>>>>> mean
>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>
stepwise manner.
>>>>>>>>>>>>>>>>>> It
isn't a single instruction such as 'escape i8* ptr', and we
>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>> over
all instructions in the function. For example, '(int)(p+1) -
>>>>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>>>>>
isn't semantically escaping the pointer p because the result is 1
>>>>>>>>>>>>>>>>>>
regardless of the value of p.
>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is a desirable
>>>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us from defining
>>>>>>>>>>>>>>>>> the
semantics of instructions, or force us to do it in an
>>>>>>>>>>>>>>>>>
"undesirable
>>>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In
practice, syntactically checking escapedness + nice
>>>>>>>>>>>>>>>>>>
engineering
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>> not
break optimizations in most cases (as undef/poison did);
>>>>>>>>>>>>>>>>>> but it
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> be
great to move to another level, since LLVM IR is used in so
>>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>> The
property under which you can coalesce objects is simple:
>>>>>>>>>>>>>>>>>       It is
not observable.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now, if the
address of one of the two objects you coalesce is not
>>>>>>>>>>>>>>>>> observed,
coalescing is not observable. That is a sufficient
>>>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>> not a
necessary one. Pointer "escaping" is one step further. If
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> address
doesn't escape it is not observed. This does not mean the
>>>>>>>>>>>>>>>>>
"semantic conditions for coalescing", e.g., for the purpose of
>>>>>>>>>>>>>>> translation
>>>>>>>>>>>>>>>>> validation,
is supposed to be build on top of our "definition of
>>>>>>>>>>>>>>> escaping
>>>>>>>>>>>>>>>>>
pointers". That said, we use "does not escape" as a
>>>>>>>>>>>>>>>>>
precondition for
>>>>>>>>>>>>>>>>> various
transformation and I'm unsure what is any different
>>>>>>>>>>>>>>>>> now. The
>>>>>>>>>>>>>>>>> entire
escape argument is only used in the validity of the pointer
>>>>>>>>>>>>>>> folding.
>>>>>>>>>>>>>>>>> Similarly,
we can fold a comparison of a noalias pointer with
>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> if the
former does not escape (and both are dereferenced and
>>>>>>>>>>>>>>>>> one is
>>>>>>>>>>>>>>>>> written for
sure).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
pointer comparison is another beast. It indeed has a few
>>>>>>>>>>>>>>>>>> issues,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> solving
it might require nontrivial solution.
>>>>>>>>>>>>>>>>> I think the
main problem of the inconsistencies and such we've
>>>>>>>>>>>>>>>>> seen is
>>>>>>>>>>>>>>>>> rooted in
the erroneous folding of pointer comparisons.
>>>>>>>>>>>>>>>>> Cleaning up
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> lifetime
marker semantics is actually unrelated and simply not
>>>>>>>>>>>>>>>>> folding
>>>>>>>>>>>>>>>>> as
described in A3 should solve the issue that has been reported.
>>>>>>>>>>>>>>>>>
Nevertheless,
>>>>>>>>>>>>>>>>> we should
take a crack at lifetime markers while we are here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue,
Jan 5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi
Juneyoung,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Happy new year :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
After we had a lengthy discussion on phab last year, I've
>>>>>>>>>>>>>>>>>>>
tried to
>>>>>>>>>>>>>>>>>>>
summarize my thoughts,
>>>>>>>>>>>>>>>>>>>
especially given that I had some time to think about things
>>>>>>>>>>>>>>>>>>>
over the
>>>>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>>>>>
Still, no promises on the quality ;)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
start with general questions I asked myself and then go on
>>>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>>>>
about a potential design.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Q1:
Is lifetime a given property or a derived one, thus is it
>>>>>>>>>>>>>>>>>>>
fixed
>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>
modifiable?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
This is a question I asked myself a lot recently. I think it is
>>>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>>>> and
modifiable,
>>>>>>>>>>>>>>>>>>> at
least I hope it is. Only that would allow transformations
>>>>>>>>>>>>>>>>>>> I
would
>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>> us
to do. Here are some examples:
>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Q2:
Is a pointer comparison, or similar use, extending the
>>>>>>>>>>>>>>>>>>>
lifetime?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Asked differently, can we hoist a pointer comparison into a
>>>>>>>>>>>>>>>>>>>
region
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>> the
pointer is dead?
>>>>>>>>>>>>>>>>>>>
This is an important question which we haven't discussed much
>>>>>>>>>>>>>>>>>>> as
we
>>>>>>>>>>>>>>>>>>>
assumed LICM has to work.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
current behavior is that non-dereferencing uses are not
>>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>> the
lifetime and are
>>>>>>>>>>>>>>>>>>>
allowed outside of "lifetime regions" (as indicated by markers).
>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>>
will always produce valid
>>>>>>>>>>>>>>>>>>>
results. Though, we might want to think about a lifetime
>>>>>>>>>>>>>>>>>>>
marker that
>>>>>>>>>>>>>>>>>>>
spits out a new pointer
>>>>>>>>>>>>>>>>>>>
value instead of reusing the old one.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Q3:
Can we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
question here is, can we fold address comparisons based on
>>>>>>>>>>>>>>>>>>>
lifetimes, or not.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So
far, we fold comparisons based on "address information". For
>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>> we
"know" globals,
>>>>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one another. Also, two
>>>>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>>>>>
allocations, for globals
>>>>>>>>>>>>>>>>>>> and
allocas, are considered unequal. Now, the crux is that we
>>>>>>>>>>>>>>>>>>>
have
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>
consistent if we do two
>>>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number missing).
>>>>>>>>>>>>>>>>>>>
Since
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>
backend (or any other place
>>>>>>>>>>>>>>>>>>> for
that matter) might coalesce allocas, their addresses
>>>>>>>>>>>>>>>>>>>
might not
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>
different after all. If we
>>>>>>>>>>>>>>>>>>>
already folded a comparison to "unequal" we are doomed if we
>>>>>>>>>>>>>>>>>>>
later
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> a
comparison that results
>>>>>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing rules as
>>>>>>>>>>>>>>>>>>>
they can
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>
stricter.)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Design:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
would hope we can come up with a model that treats memory "the
>>>>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>>>>>
regardless if it is global,
>>>>>>>>>>>>>>>>>>>
stack, or heap. I want to avoid special casing one of them wrt.
>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>>>> as
I believe most optimizations
>>>>>>>>>>>>>>>>>>>
would apply to any of them, potentially for different reasons
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>
different gains, but nevertheless.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A1:
Lifetime is a concept that talks about memory content
>>>>>>>>>>>>>>>>>>>
*only*.
>>>>>>>>>>>>>>>>>>>
Basically, the memory content is set to
>>>>>>>>>>>>>>>>>>>
         undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>
just an "as-is" property of the memory
>>>>>>>>>>>>>>>>>>>
         content. The lifetimes of an object, as described by
>>>>>>>>>>>>> markers,
>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>
change during the compilation. They
>>>>>>>>>>>>>>>>>>>
         might become smaller if we deduce the object is not
>>>>>>>>>>>>>>>>>>>
accessed
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> the
memory content is not used, they
>>>>>>>>>>>>>>>>>>>
         might become larger if objects with non-overlapping
>>>>>>>>>>>>> lifetimes
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>>>>
         introducing a new object though.)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A2:
If we define lifetime as above, it has nothing to do with
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>
address of an object. Consequently, pointer
>>>>>>>>>>>>>>>>>>>
         comparisons and similar operations are valid outside
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>>>>>
Loads and stores are as well, they can
>>>>>>>>>>>>>>>>>>>
         even not be removed "easily". A store followed by a
>>>>>>>>>>>>>>>>>>>
lifetime
>>>>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>>>>> or
a load following a lifetime marker
>>>>>>>>>>>>>>>>>>>
         is dead or results in undef respectively.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A3:
We could not use lifetime to argue about addresses. This
>>>>>>>>>>>>>>>>>>>
means
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>>>
         lifetimes result in different addresses. Thus, a
>>>>>>>>>>>>>>>>>>>
comparison
>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>> the
address of two allocas could not
>>>>>>>>>>>>>>>>>>>
         immediately be folded. That said, there would be
>>>>>>>>>>>>>>>>>>>
special
>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>
though. Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>>>
         not escape, other than the comparisons to be folded,
>>>>>>>>>>>>>>>>>>> we
can
>>>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>>>>
them. Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>>>
         would still be consistent because it is
>>>>>>>>>>>>>>>>>>>
unobservable. The
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>> we
have in-tree is that we fold even though
>>>>>>>>>>>>>>>>>>>
         the address is still observed (after the fold). It
>>>>>>>>>>>>>>>>>>> is
still
>>>>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>>>>> to
me what the impact of this would be
>>>>>>>>>>>>>>>>>>>
         on real code. I suggested before that we run some
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>
before we make any decision whatsoever.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
This is pretty much saying that lifetime markers are
>>>>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> you
suggested before (I think).
>>>>>>>>>>>>>>>>>>>
There are some implementation level differences but at the
>>>>>>>>>>>>>>>>>>> end
of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> day
they are basically the same.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Happy to hear some thoughts on this, especially if it fixes the
>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>>
that lead to D93376 in the first place.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ~
Johannes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>>>>
Hello all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of lifetime.start/end
>>>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>>>>
here (
>>>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>>>> (syntactically
>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
1. Is there a frontend or framework that introduces lifetime
>>>>>>>>>>>>>>>>>>>>
call
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>>>>>>>>>>>>>>>>>>>>
If exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
2. Can a block's lifetime be started bytewise or elementwise?
>>>>>>>>>>>>>>>>>>>>
I imagine an optimization that allow using stack very
>>>>>>>>>>>>>>>>>>>>
compactly,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>
wonder
>>>>>>>>>>>>>>>>>>>>
there is a real-world use case.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
-- 
Website: https://people.mpi-sws.org/~jung/

Johannes Doerfert via llvm-dev

2021-Feb-01 05:36 UTC

head link

[llvm-dev] lifetime.start/end

On 1/30/21 7:40 AM, Ralf Jung wrote:> Hi Johannes,
>
>>>> I was opposing the proposal because of the brittle syntactic 
>>>> restrictions. In my attempt to justify not having those I also 
>>>> tried to resolve the above inconsistency in a more general
manner
>>>> because I thought that was what it was all about. I believe
that we
>>>> either have to argue the address of a dead object is 
>>>> unusable/undetermined, basically the direction of James
comment, or
>>>> we need to decouple lifetime markers from locations.
>>>
>>> Folding to false and coalescing are inconsistent with each other, 
>>> yes. It is just wrong to perform both of these on the above
program,
>>> I would say (if we interpret it with LLVM semantics, which I assume
>>> has no "pointer zapping"). Is that up for debate?
>>>
>>> However, I do not see the immediate connection to lifetime markers 
>>> -- the same would be true for a program using malloc/free. 
>>> Coalescing is permissible whenever lifetimes are disjoint. Folding 
>>> to false is only permissible if the pointers are *definitely not 
>>> equal*, e.g. if their lifetimes overlap.
>>>
>> I still believe folding can be done regardless of their lifetime, as 
>> long as it is done consistent. Fold all pointer observations and the 
>> actual
>> placement is not observable, we went down that road.
>
> Sure, but that's the kind of exception that always applies when 
> discussing optimizations.
> But to make my statement more precise: you cannot fold comparison of 
> *escaped* pointers to false unless the pointers are *definitely not 
> equal*, e.g. if their lifetimes overlap.
>The interesting part is, lifetimes change. I continue to argue you 
cannot fold if both pointers have escaped. If you do, and other things 
happen that fiddle with the lifetime, you might accidentally coalesce 
them (explicitly or implicitly).

>> I agree with the malloc/free comparison, I did argue that the entire 
>> time, or I tried.
>>
>> If there is no/little connection to lifetime markers, I assume 
>> https://reviews.llvm.org/D93376 did not fix the inconsistency then. 
>> It was,
>> apparently, just an approach to clarify what a lifetime marker is, 
>> right? I might be more confused than before. I believe you should
equate
>> lifetime markers and memset(undef) and put the entire discussion 
>> about folding/coalescing onto a foundation that is unrelated to
lifetime
>> markers.
>
> My understanding is that lifetime markers are *explicitly intended to 
> support coalescing* by permitting later phases to put allocas with 
> disjoint lifetimes into the same stack slot. D93376 intended to keep 
> it that way; fundamentally changing what lifetime markers are about by 
> making them only about the content and not the address of the alloca 
> seemed like a much more invasive change.
> Am I misunderstanding something?
I do think they were designed for that, yes. And I do agree my change 
looks more invasive from that perspective.

Now does D93376 help with the problems we have is my question. Or, asked 
differently, what is a way out of this which improves the big picture, 
and what is practical. For the first, limiting it to allocas seems to be 
the wrong way. So far, the argument seems to be that "we only use it for 
them", but I'm not sure why a use for other pointers would be somehow 
bad (https://godbolt.org/z/G8obj3). Second, limiting it to syntactic 
constructs is the opposite of practical. It has been said before, 
syntactic constraints are brittle. Beyond that, the syntactic def-use 
chain requirement has not even come up in any of this mails, at least as 
far as I understand it. It still is unclear to me how that would help.

That all said, let's see what we need:
1) A way to ensure we don't create inconsistent situations wrt. 
allocation addresses where there were none.
2) A way to ensure we can reasonably coalesce allocas to minimize stack 
usage.

For one, I feel we converged pretty much on the fact that pointer 
comparison folding is a problem. In the example below we have lifetime 
markers and we use them to argue about lifetimes of the allocations, but 
we still agree the folding should not happen. Once the folding is 
restricted to pointer pairs for which at least one doesn't have it's 
address observed, I would like to revisit 1). So far, I believe it would 
be resolved. For 2), we could stick with what we have now, or adapt 
something new afterwards. I guess the existing wording needs to be 
cleared up either way.

Long story short, I don't see how D93376 addresses what seems to be the 
problem.


>
>>> The syntactic condition also provides a clear path to evaluate 
>>> whether any given folding or coalescing is correct. 
>>
>> I disagree. The syntactic conditions, at least as I read them, allow 
>> the above program with lifetime markers (as we have it now) but did not
>> suggest it would be wrong to fold and later coalesce. Or maybe I 
>> missed why the syntactic restrictions would prevent that, please 
>> explain then.
>
> Where in that program would you place the lifetime markers? I assume 
> at the beginning and end of the local variable's scope?
> So that would become something like (also reflecting that allocas are 
> all at the top):
>
> void foo() {
>    intptr_t xp, yp;
>    int x, y;
>    { lifetime.start(&x); xp = &x; bar(&x);
lifetime.end(&x); }
>    { lifetime.start(&y); yp = &y; bar(&y);
lifetime.end(&y); }
>    print(xp == yp);               // can we fold this comparison to 
> false?
> }
>Yes, this is how clang would generate the code right now.

> I think evaluating correctness of folding to false here is quite clear 
> (I assume this is just using C syntax to represent LLVM IR, so we 
> applyLLVM IR semantics):
Agreed.

> - The program as-written is non-deterministic; x and y have disjoint 
> lifetime so they could have the same or a different address.
Agreed (for the C version for which the above is the LLVM-IR 
implementation).

> - When the compiler decides to fold the comparison to false, it means 
> it has to commit to one of the possible non-det choices, and needs to 
> stick to that choice. 
Agreed.

> So *just* replacing "xp == yp" by "false" is incorrect
in the sense
> that the new program has a possible behavior that did not occur with 
> the old program: "true false" can now be printed (*if* the
allocator
> chooses to put x and y into the same location), whereas the original 
> program would never print "true false". 
Agreed.

> Note that we do not need to talk about coalescing here, we are just 
> exploiting the fact that allocation is non-deterministic and that 
> allocations with non-overlapping lifetime can be in the same spot or 
> in different spots.
Sure. When I refer to coalescing I was also including "implicit 
coalescing", e.g., as happened in my outlining example.

>
> IOW, coalescing "x" and "y" in this program is correct,
but folding
> the comparison to false is incorrect.
Agreed, assuming we don't do anything to bar.

>
> So when you disagree with my statement that we have "a clear path to 
> evaluate whether any given folding or coalescing is correct", then 
> which of the above to you disagree with?
None. I disagreed with the part "The syntactic condition also provides a 
clear path ...", which implies this path was dependent on the syntactic 
condition. I don't see you making an argument about syntactic def-use 
chains above.

I am basically struggling to understand why we need to introduce 
syntactic restrictions on lifetime markers and why they need to be 
limited to allocas. Those are two concepts included in D93376, right?

>
> Folding to false would be correct if it would be accompanied by also 
> embedding into the program some hint which tells the allocator that 
> the two allocations must not be at the same spot in memory. This would 
> be a way to ensure that LLVM "sticks to its choice". But that
issue is
> entirely orthogonal to the semantics of lifetime.start / lifetime.end; 
> it arises with allocation lifetimes of any kind, including malloc/free.
>
> I think you are proposing to make them only about the contents of the 
> alloca, so in the above "foo", now the lifetime of (the
allocations
> backing) "x" and "y" *would* overlap, so folding to
false (of escaped
> pointers) is... still not allowed, but now for less subtle reasons. 
> But this doesn't *solve* the "fold to false" problem, it just
means
> that the problem no longer applies to this particular program.
My original email in this thread addressed the folding problem but I 
clarified since: Don't fold if both addresses are observed, regardless 
of any lifetime argument. You seem to mix the comment I made on phab 
last year with the email discussion. The phab conversation is outdated 
and replaced by this email thread.

>
> Or is your argument that, since "fold to false" is wrong in both 
> cases, it doesn't really matter which choice we make?  However, in 
> your proposal, the option of doing the fold while also "adding hints 
> for the allocator" no longer exists.
My proposal comes in two parts: Fix the folding issue, then revisit 
markers. I'm not convinced that the folding issue can be fixed by
"hints
for the allocator" but I believe the escape argument holds.


>
>> I don't think anyone wanted that. TBH, I think my proposal is by
far
>> the most compatible one wrt. the "classic" model w/o lifetime
>> markers, given
>> that I proposed not to entangle them in object allocation. Everything 
>> else I proposed are "as-if" transformations. Do you disagree?
>
> I agree that making the markers just erase the contents of the alloca 
> is the simplest proposal on the table. *If* we truly can make LLVM 
> follow that model, that would be very nice.
> I would just be very surprised if it was possible to get this proposal 
> past the people that will carefully watch over the existing 
> optimization potential LLVM has, and make sure it does not become 
> smaller. ;)  Maybe I am too pessimistic, but I would have never dared 
> even making such a proposal as I would expect it to be shot down. I'll 
> be happy to be proven wrong about this. :)
I've made a few actually, `mustprogress` was the latest and you were 
involved ;). It does prevent certain optimizations we did before 
unconditionally but it also fixes a conceptual problem. At least the 
pointer inconsistency is a conceptual problem that needs fixing. What we 
do with lifetime markers is probably orthogonal.

~ Johannes

>
> Kind regards,
> Ralf
>
>>
>> ~ Johannes
>>
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>>
>>>> ~ Johannes
>>>>
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>> A lot of my arguments around alloca/malloc/... is that
we can
>>>>>> augment the information from the frontend by
determining if an
>>>>>> address is observed or not. If it is not, we can limit
its
>>>>>> allocation, coalesce it, etc. Lifetime markers can help
us do
>>>>>> this but they are not the only way.
>>>>>> So, basically, we do not need lifetime markers to
coalesce
>>>>>> alloca/malloc/... but that they can help.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Let's try to get out of this rabbit hole. If you
have a problem
>>>>>> with the proposal as outlined in 
>>>>>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
>>>>>> , please let me know. I assume dealing with observed
addresses
>>>>>> different than with non-observede ones (implied by
noescape) is
>>>>>> the way forward now.
>>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ralf
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Ralf
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Does that make some sense?
>>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Kind regards,
>>>>>>>>>>> Ralf
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung
Lee wrote:
>>>>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54
AM Johannes Doerfert
>>>>>>>>>>>>> <johannesdoerfert at
gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/7/21 4:55 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>> It is, alloca must
have its integral address decided at
>>>>>>>>>>>>>>> its *allocation*
>>>>>>>>>>>>>>> time as well, like
0xFF00... .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For example, if
0xFFFF0000 ~ 0xFF000000 is already
>>>>>>>>>>>>>>> allocated, the new
>>>>>>>>>>>>>>> alloca cannot be
placed there unless it is not observed
>>>>>>>>>>>>>>> in the future,
>>>>>>>>>>>>>>> according to your
proposal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But how do we know
from the current state whether the
>>>>>>>>>>>>>>> stack variable is
>>>>>>>>>>>>>>> going to be
observed or not?
>>>>>>>>>>>>>> With that argument you
could allocate all but 4 bytes, do
>>>>>>>>>>>>>> a malloc(4)
>>>>>>>>>>>>>> and know the address of
the returned pointer (assuming it
>>>>>>>>>>>>>> is not null).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What I try to say is,
either your scenario is part of the
>>>>>>>>>>>>>> model and
>>>>>>>>>>>>>> everything we do is
broken as you could "observe"
>>>>>>>>>>>>>> addresses passively,
>>>>>>>>>>>>>> *or*, the abstract
machine we use for semantic reasoning
>>>>>>>>>>>>>> doesn't permit
>>>>>>>>>>>>>> the above reasoning. I
really hope for the latter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> That's a very valid
point..!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually, I have a memory
model that addresses this problem.
>>>>>>>>>>>>> The gist is that we can
interpret each allocation
>>>>>>>>>>>>> instruction as *creating
>>>>>>>>>>>>> 2 blocks* and
nondeterministically returning one of them.
>>>>>>>>>>>>> The other one is marked as
invalid but not freed
>>>>>>>>>>>>> immediately. Deallocation
>>>>>>>>>>>>> frees both blocks.
>>>>>>>>>>>>> If there is not enough
space for having two blocks, it is
>>>>>>>>>>>>> treated as
>>>>>>>>>>>>> out-of-memory.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It makes guessing the
address of an allocation according
>>>>>>>>>>>>> to memory layout
>>>>>>>>>>>>> invalid UB.
>>>>>>>>>>>>>
>>>>>>>>>>>>> p = malloc(4)
>>>>>>>>>>>>> // If p != null, it is
guaranteed there was at least two 4
>>>>>>>>>>>>> byte slots
>>>>>>>>>>>>> available, say 0x100~0x104
and 0x110~0x114.
>>>>>>>>>>>>> // Two blocks are allocated
at 0x110 and 0x114, and one of
>>>>>>>>>>>>> the addresses is
>>>>>>>>>>>>> nondeterministically
returned.
>>>>>>>>>>>>> // By the nature of
nondeterminism, all executions below
>>>>>>>>>>>>> should be
>>>>>>>>>>>>> well-defined regardless of
the address of p.
>>>>>>>>>>>>> *(int*)0x100 = 10; // In an
execution where p is 0x110,
>>>>>>>>>>>>> this raises UB.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The paper link is here 
>>>>>>>>>>>>>
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>>>>>
>>>>>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> To argue differently: Who
is to say there is a stack, or
>>>>>>>>>>>>> only one,
>>>>>>>>>>>>>> or that alloca
allocates memory "on the one stack"? That
>>>>>>>>>>>>>> is not part of
>>>>>>>>>>>>>> the IR, IMHO. I can
write a machine on which alloca
>>>>>>>>>>>>>> lowers to malloc,
>>>>>>>>>>>>>> I don't even need
the free during stack unwind but that I
>>>>>>>>>>>>>> could do as
>>>>>>>>>>>>>> well if I wanted to.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is right, the example
was for illustrative purposes.
>>>>>>>>>>>>> IR does not
>>>>>>>>>>>>> enforce alloca to be placed
at 'stack'.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Making
icmp/ptrtoint yield poison will still make loop
>>>>>>>>>>>>>>> versioning or
>>>>>>>>>>>>>>> pointer rewriting
transformations unsound because these
>>>>>>>>>>>>>>> operations now
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> create poison (even
if pointers are noundef).
>>>>>>>>>>>>>> I did not say they
yield poison, at least I did not try
>>>>>>>>>>>>>> to say that.
>>>>>>>>>>>>>> What are you referring
to exactly?
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I was referring to this:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising
>>>>>>>>>>>>>>> instructions, which
>>>>>>>>>>>>>>> contradicts with
what LLVM does.
>>>>>>>>>>>>>> As with other violation
of attributes I would, on first
>>>>>>>>>>>>>> though, suggest
>>>>>>>>>>>>>> to produce poison, not
UB.
>>>>>>>>>>>>> But it is more about the
(imaginary) attribute, so maybe I
>>>>>>>>>>>>> was slightly out
>>>>>>>>>>>>> of topic.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) is fine, I think the
suggestion semantically makes
>>>>>>>>>>>>>> sense perfectly. 1)
>>>>>>>>>>>>>>> is something
I'm concerned about now.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There are more than
pointer foldings, such as rewriting
>>>>>>>>>>>>>>> such expression,
>>>>>>>>>>>>>>> code motion ptr
cmp, introduce ptr cmp, etc. There is
>>>>>>>>>>>>>>> also analysis
>>>>>>>>>>>>>> relying
>>>>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>>>>> Defining the
correctness of each of them is something we
>>>>>>>>>>>>>>> want to avoid,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> maybe that's
why we want to define precise semantics for
>>>>>>>>>>>>>>> things.
>>>>>>>>>>>>>> I don't get the
point. My proposal does not change the
>>>>>>>>>>>>>> semantics of
>>>>>>>>>>>>>> pointer comparisons, at
all. I explicitly mentioned that
>>>>>>>>>>>>>> in the last
>>>>>>>>>>>>>> email.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Oh okay, I thought it was a
part of the lifetime proposal,
>>>>>>>>>>>>> but it seems
>>>>>>>>>>>>> more like a separate thing.
>>>>>>>>>>>>> I agree that this requires
performance impact.
>>>>>>>>>>>>> Also investigation of
existing transformations would be
>>>>>>>>>>>>> needed; Alive2's
>>>>>>>>>>>>> pointer comparison is doing
approximation yet, but if it
>>>>>>>>>>>>> becomes fully
>>>>>>>>>>>>> precise, it will show
something from running LLVM unit
>>>>>>>>>>>>> tests I believe..! :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this will be
less aggressive and may give nice
>>>>>>>>>>>>>> feedback to
>>>>>>>>>>>>>>> potential projects
that are using lifetime with non-alloca.
>>>>>>>>>>>>>> The lifetime marker
debate, basically 2) above, is
>>>>>>>>>>>>>> orthogonal to the
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>> you try to solve. It
got mixed in as lifetime markers
>>>>>>>>>>>>>> were used by
>>>>>>>>>>>>>> StackColoring
>>>>>>>>>>>>>> to perform coalescing
but that is coincidental. You can
>>>>>>>>>>>>>> (arguably)
>>>>>>>>>>>>>> coalesce stack
>>>>>>>>>>>>>> allocations regardless
of lifetime markers and with 1)
>>>>>>>>>>>>>> such a
>>>>>>>>>>>>>> transformation
>>>>>>>>>>>>>> (w/ and w/o lifetime
markers) would actually be sound.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At the end, we can
implement IR writer that lowers
>>>>>>>>>>>>>>> lifetime with
>>>>>>>>>>>>>> non-alloca
>>>>>>>>>>>>>>> into memset(undef).
WDYT?
>>>>>>>>>>>>>> Yeah, 2) is orthogonal
and we can lower it that way.
>>>>>>>>>>>>>> Unsure if it is
>>>>>>>>>>>>>> helpful
>>>>>>>>>>>>>> but we can certainly
define it that way in the LangRef.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Okay, thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> p.s. The reply was
late, sorry. I think I can spend more
>>>>>>>>>>>>>>> time on this
>>>>>>>>>>>>>> today.
>>>>>>>>>>>>>>> On Thu, Jan 7, 2021
at 9:02 AM Johannes Doerfert <
>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 1/6/21 4:33
PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is
>>>>>>>>>>>>>>>>>>> a
desirable
>>>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>>>>
IMO.
>>>>>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us
>>>>>>>>>>>>>>>>>> from
defining
>>>>>>>>>>>>>>>>>> the
semantics of instructions, or force us to do it
>>>>>>>>>>>>>>>>>> in an
"undesirable
>>>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I meant it
would be great if the output state after
>>>>>>>>>>>>>>>>> executing
an
>>>>>>>>>>>>>>>>> instruction
can be described using its input state.
>>>>>>>>>>>>>>>>> (that was
the meaning of 'stepwise semantics', I
>>>>>>>>>>>>>>>>> should have
been more
>>>>>>>>>>>>>>>>> clear about
this)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For
example, the semantics of 'z = add x y' can be
>>>>>>>>>>>>>>>>> defined as
follows:
>>>>>>>>>>>>>>>>> Given an
input state s, next state s' = s[z -> s(x) +
>>>>>>>>>>>>>>>>> s(y)]
>>>>>>>>>>>>>>>>> where s(x)
is the value of x in the previous state,
>>>>>>>>>>>>>>>>> and s[z
-> v] is a
>>>>>>>>>>>>>>>>> state with
z updated to v.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Another
example that involves memory: the semantics of
>>>>>>>>>>>>>>>>> 'i =
ptrtoint p'
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> be defined
as follows:
>>>>>>>>>>>>>>>>> Given an
input state s, next state s' = s[i ->
>>>>>>>>>>>>>>>>>
s(p).obj.address +
>>>>>>>>>>>>>>>>>
s(p).offset]
>>>>>>>>>>>>>>>>> where
obj.address is the begin address of a memory
>>>>>>>>>>>>>>>>> object obj
pointed
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>>>> &
offset is p's byte offset. (Imagine a pointer to the
>>>>>>>>>>>>>>>>> offset of
some
>>>>>>>>>>>>>>>> char
>>>>>>>>>>>>>>>>> array).
>>>>>>>>>>>>>>>>> Note that
ptrtoint & add can be nicely defined w.r.t
>>>>>>>>>>>>>>>>> the input
state.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now, the
instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>>>>>> To describe
the output state after executing 'p =
>>>>>>>>>>>>>>>>>
alloca', the address
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> new alloca
should be determined.
>>>>>>>>>>>>>>>>> If
observedness is involved, we need to know the
>>>>>>>>>>>>>>>>> future
state again. :/
>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>> don't
know whether the alloca is going to be observed
>>>>>>>>>>>>>>>>> or not
without
>>>>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>>>> This is the
problem of the current lifetime intrinsics
>>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>> No, you mix
things up here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Nobody proposed
to modify the semantics of `alloca`.
>>>>>>>>>>>>>>>> `alloca`
provides you with a fresh, unobserved block of
>>>>>>>>>>>>>>>> dereferenceable
memory that is implicitly freed as the
>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>> unwinds. That
is it. No context necessary.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you want to
modify the IR, you need to argue the
>>>>>>>>>>>>>>>> observable
>>>>>>>>>>>>>>>> semantics which
is nothing new. That this might require
>>>>>>>>>>>>>>>> more than
>>>>>>>>>>>>>>>> a peephole view
of the program is also not new.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One
possible approach to resolve this is adding an
>>>>>>>>>>>>>>>>>
'unobserved' flag to
>>>>>>>>>>>>>>>>> alloca
instruction (similar to what was suggested by
>>>>>>>>>>>>>>>>> Nicolai).
>>>>>>>>>>>>>>>>> And, we can
say that if alloca with 'unobserved' is
>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>>>>>> The flag can be
added, like we add other attributes. It
>>>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>>>> be required for
any optimization we talked about
>>>>>>>>>>>>>>>> though. It
basically
>>>>>>>>>>>>>>>> is a way to
manifest derived or given information into
>>>>>>>>>>>>>>>> the IR.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Attribute
deduction, as well as frontends with domain
>>>>>>>>>>>>>>>> knowledge,
>>>>>>>>>>>>>>>> can add such
information. The flag we discussed in phab
>>>>>>>>>>>>>>>> was not even
>>>>>>>>>>>>>>>> sufficient for
all the transformation examples I
>>>>>>>>>>>>>>>> presented in my
mail,
>>>>>>>>>>>>>>>> that is why I
extended my  argument. We could still
>>>>>>>>>>>>>>>> have a
"noescape"
>>>>>>>>>>>>>>>> flag for
allocas, but I'm not sure how useful that
>>>>>>>>>>>>>>>> really is. We
can
>>>>>>>>>>>>>>>> certainly
deduce it and manifest it, unsure if we have
>>>>>>>>>>>>>>>> domain
knowledge
>>>>>>>>>>>>>>>> we can use for
non-trivial cases though.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But this
makes ptrtoint/icmp make UB-raising
>>>>>>>>>>>>>>>>>
instructions, which
>>>>>>>>>>>>>>>>> contradicts
with what LLVM does.
>>>>>>>>>>>>>>>> As with other
violation of attributes I would, on first
>>>>>>>>>>>>>>>> though, suggest
>>>>>>>>>>>>>>>> to produce
poison, not UB.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Also,
existing optimizations like loop versioning can
>>>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>>>>
ptrtoint/pointer comparisons too.
>>>>>>>>>>>>>>>> Sure. I am not
certain why that is a problem. I get the
>>>>>>>>>>>>>>>> feeling things
>>>>>>>>>>>>>>>> are still mixed
up here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What I proposed
is twofold:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) We stop
folding comparisons between different
>>>>>>>>>>>>>>>> allocas if
changing the
>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>       of both
might be observable. Thus, if both might
>>>>>>>>>>>>>>>> have their
address
>>>>>>>>>>>>>>>>
"taken"/escaped,
>>>>>>>>>>>>>>>>       other
than the comparisons we want to fold, we
>>>>>>>>>>>>>>>> cannot proceed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) We define
lifetime markers to mean `memset(undef)`.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The first
should be sufficient for the problem you were
>>>>>>>>>>>>>>>> trying to solve
>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>> first place.
The second makes lifetime markers less
>>>>>>>>>>>>>>>> weird. Note
that 1)
>>>>>>>>>>>>>>>> is not changing
>>>>>>>>>>>>>>>> the semantics
of the IR. We basically just argue there
>>>>>>>>>>>>>>>> is a bug in our
>>>>>>>>>>>>>>>> instcombine
right
>>>>>>>>>>>>>>>> now as we do
not check all necessary preconditions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see that
there are other questions that I didn't
>>>>>>>>>>>>>>>>> answer yet,
but let
>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>> answer this
first to limit the length of the text :)
>>>>>>>>>>>>>>>> Sure, we can
split the discussion :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan
7, 2021 at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On
1/5/21 8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>> Hi
Johannes,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
read your proposal and thought about the model.
>>>>>>>>>>>>>>>>>> Cool,
thanks!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As
you concerned in A3, certain programs may be
>>>>>>>>>>>>>>>>>>>
valid only when
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>>>
blocks with overlapping lifetimes have disjoint
>>>>>>>>>>>>>>>>>>>
addresses.
>>>>>>>>>>>>>>>>>>>
Look at this example (I'm using malloc, but alloca
>>>>>>>>>>>>>>>>>>>
also works):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> p1
= malloc(4)
>>>>>>>>>>>>>>>>>>> p2
= malloc(4) // for brevity, assume that there as
>>>>>>>>>>>>>>>>>>>
enough space & p1
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> p2
!= null
>>>>>>>>>>>>>>>>>>>
set<char*> s;
>>>>>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert
>>>>>>>>>>>>>>>>>>> did
nothing, it
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> be
surprise to programmers
>>>>>>>>>>>>>>>>>>>
work(s);
>>>>>>>>>>>>>>>>>>>
free(data1)
>>>>>>>>>>>>>>>>>>>
free(data2)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Clearly, IR semantics should guarantee that escaped
>>>>>>>>>>>>>>>>>>>
blocks are
>>>>>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>
would be great for verification tools on LLVM IR to
>>>>>>>>>>>>>>>>>>> be
able to answer
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> the
second insert will succeed.
>>>>>>>>>>>>>>>>>> I
agree, the second insert should succeed, assuming
>>>>>>>>>>>>>>>>>> `p1
&& p2`.
>>>>>>>>>>>>>>>>>> I
don't think my proposal would in any way impact the
>>>>>>>>>>>>>>>>>> program
above,
>>>>>>>>>>>>>>>>>> if
anything the A3 reasoning makes sure such a
>>>>>>>>>>>>>>>>>> program
with allocas
>>>>>>>>>>>>>>>>>> is not
miscompiled.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm
also not sure I understand what you try to argue
>>>>>>>>>>>>>>>>>> for.
Maybe
>>>>>>>>>>>>>>>>>>
elaborate a bit what it is you think is bad or needs
>>>>>>>>>>>>>>>>>> to be
changed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
problem is that definition of escapedness is not
>>>>>>>>>>>>>>>>>>>
clear at the
>>>>>>>>>>>>>>>>>>
semantic
>>>>>>>>>>>>>>>>>>>
level. Describing the IR semantics w.r.t. LLVM's
>>>>>>>>>>>>>>>>>>>
escaped analysis
>>>>>>>>>>>>>>>>>>
function
>>>>>>>>>>>>>>>>>>>
isn't something we want.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Semantic definition of escapedness of a pointer
>>>>>>>>>>>>>>>>>>>
seems hard, I mean
>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>
stepwise manner.
>>>>>>>>>>>>>>>>>>> It
isn't a single instruction such as 'escape i8*
>>>>>>>>>>>>>>>>>>>
ptr', and we need
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>
over all instructions in the function. For example,
>>>>>>>>>>>>>>>>>>>
'(int)(p+1) -
>>>>>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>>>>>>
isn't semantically escaping the pointer p because
>>>>>>>>>>>>>>>>>>> the
result is 1
>>>>>>>>>>>>>>>>>>>
regardless of the value of p.
>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is
>>>>>>>>>>>>>>>>>>> a
desirable
>>>>>>>>>>>>>>>>>>
direction
>>>>>>>>>>>>>>>>>>>
IMO.
>>>>>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us
>>>>>>>>>>>>>>>>>> from
defining
>>>>>>>>>>>>>>>>>> the
semantics of instructions, or force us to do it
>>>>>>>>>>>>>>>>>> in an
"undesirable
>>>>>>>>>>>>>>>>>>
way"?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In
practice, syntactically checking escapedness +
>>>>>>>>>>>>>>>>>>>
nice engineering
>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> not
break optimizations in most cases (as
>>>>>>>>>>>>>>>>>>>
undef/poison did); but it
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> be
great to move to another level, since LLVM IR is
>>>>>>>>>>>>>>>>>>>
used in so many
>>>>>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>> The
property under which you can coalesce objects is
>>>>>>>>>>>>>>>>>> simple:
>>>>>>>>>>>>>>>>>>      
It is not observable.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Now, if
the address of one of the two objects you
>>>>>>>>>>>>>>>>>>
coalesce is not
>>>>>>>>>>>>>>>>>>
observed, coalescing is not observable. That is a
>>>>>>>>>>>>>>>>>>
sufficient condition
>>>>>>>>>>>>>>>>>> not a
necessary one. Pointer "escaping" is one step
>>>>>>>>>>>>>>>>>>
further. If the
>>>>>>>>>>>>>>>>>> address
doesn't escape it is not observed. This does
>>>>>>>>>>>>>>>>>> not
mean the
>>>>>>>>>>>>>>>>>>
"semantic conditions for coalescing", e.g., for the
>>>>>>>>>>>>>>>>>> purpose
of
>>>>>>>>>>>>>>>> translation
>>>>>>>>>>>>>>>>>>
validation, is supposed to be build on top of our
>>>>>>>>>>>>>>>>>>
"definition of
>>>>>>>>>>>>>>>> escaping
>>>>>>>>>>>>>>>>>>
pointers". That said, we use "does not escape" as a
>>>>>>>>>>>>>>>>>>
precondition for
>>>>>>>>>>>>>>>>>> various
transformation and I'm unsure what is any
>>>>>>>>>>>>>>>>>>
different now. The
>>>>>>>>>>>>>>>>>> entire
escape argument is only used in the validity
>>>>>>>>>>>>>>>>>> of the
pointer
>>>>>>>>>>>>>>>> folding.
>>>>>>>>>>>>>>>>>>
Similarly, we can fold a comparison of a noalias
>>>>>>>>>>>>>>>>>> pointer
with another
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>> if the
former does not escape (and both are
>>>>>>>>>>>>>>>>>>
dereferenced and one is
>>>>>>>>>>>>>>>>>> written
for sure).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
pointer comparison is another beast. It indeed
>>>>>>>>>>>>>>>>>>> has
a few issues,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>
solving it might require nontrivial solution.
>>>>>>>>>>>>>>>>>> I think
the main problem of the inconsistencies and
>>>>>>>>>>>>>>>>>> such
we've seen is
>>>>>>>>>>>>>>>>>> rooted
in the erroneous folding of pointer
>>>>>>>>>>>>>>>>>>
comparisons. Cleaning up
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>
lifetime marker semantics is actually unrelated and
>>>>>>>>>>>>>>>>>> simply
not folding
>>>>>>>>>>>>>>>>>> as
described in A3 should solve the issue that has
>>>>>>>>>>>>>>>>>> been
reported.
>>>>>>>>>>>>>>>>>>
Nevertheless,
>>>>>>>>>>>>>>>>>> we
should take a crack at lifetime markers while we
>>>>>>>>>>>>>>>>>> are
here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~
Johannes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>>>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Hi Juneyoung,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Happy new year :)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
After we had a lengthy discussion on phab last
>>>>>>>>>>>>>>>>>>>>
year, I've tried to
>>>>>>>>>>>>>>>>>>>>
summarize my thoughts,
>>>>>>>>>>>>>>>>>>>>
especially given that I had some time to think
>>>>>>>>>>>>>>>>>>>>
about things over the
>>>>>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>>>>>>
Still, no promises on the quality ;)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I start with general questions I asked myself and
>>>>>>>>>>>>>>>>>>>>
then go on
>>>>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>>>>>
about a potential design.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Q1: Is lifetime a given property or a derived one,
>>>>>>>>>>>>>>>>>>>>
thus is it fixed
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>
modifiable?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
This is a question I asked myself a lot recently. I
>>>>>>>>>>>>>>>>>>>>
think it is
>>>>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>>>>>
and modifiable,
>>>>>>>>>>>>>>>>>>>>
at least I hope it is. Only that would allow
>>>>>>>>>>>>>>>>>>>>
transformations I would
>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>
us to do. Here are some examples:
>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Q2: Is a pointer comparison, or similar use,
>>>>>>>>>>>>>>>>>>>>
extending the lifetime?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Asked differently, can we hoist a pointer
>>>>>>>>>>>>>>>>>>>>
comparison into a region
>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>
the pointer is dead?
>>>>>>>>>>>>>>>>>>>>
This is an important question which we haven't
>>>>>>>>>>>>>>>>>>>>
discussed much as we
>>>>>>>>>>>>>>>>>>>>
assumed LICM has to work.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The current behavior is that non-dereferencing uses
>>>>>>>>>>>>>>>>>>>>
are not
>>>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>>>
the lifetime and are
>>>>>>>>>>>>>>>>>>>>
allowed outside of "lifetime regions" (as indicated
>>>>>>>>>>>>>>>>>>>>
by markers).
>>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>>>
will always produce valid
>>>>>>>>>>>>>>>>>>>>
results. Though, we might want to think about a
>>>>>>>>>>>>>>>>>>>>
lifetime marker that
>>>>>>>>>>>>>>>>>>>>
spits out a new pointer
>>>>>>>>>>>>>>>>>>>>
value instead of reusing the old one.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Q3: Can we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The question here is, can we fold address
>>>>>>>>>>>>>>>>>>>>
comparisons based on
>>>>>>>>>>>>>>>>>>>>
lifetimes, or not.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
So far, we fold comparisons based on "address
>>>>>>>>>>>>>>>>>>>>
information". For
>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>
we "know" globals,
>>>>>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one
>>>>>>>>>>>>>>>>>>>>
another. Also, two
>>>>>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>>>>>>
allocations, for globals
>>>>>>>>>>>>>>>>>>>>
and allocas, are considered unequal. Now, the crux
>>>>>>>>>>>>>>>>>>>>
is that we have
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>
consistent if we do two
>>>>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number
>>>>>>>>>>>>>>>>>>>>
missing). Since
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>
backend (or any other place
>>>>>>>>>>>>>>>>>>>>
for that matter) might coalesce allocas, their
>>>>>>>>>>>>>>>>>>>>
addresses might not
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>
different after all. If we
>>>>>>>>>>>>>>>>>>>>
already folded a comparison to "unequal" we are
>>>>>>>>>>>>>>>>>>>>
doomed if we later
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>
a comparison that results
>>>>>>>>>>>>>>>>>>>>
in "equal". (Note, this is different from aliasing
>>>>>>>>>>>>>>>>>>>>
rules as they can
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>
stricter.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Design:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I would hope we can come up with a model that
>>>>>>>>>>>>>>>>>>>>
treats memory "the
>>>>>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>>>>>>
regardless if it is global,
>>>>>>>>>>>>>>>>>>>>
stack, or heap. I want to avoid special casing one
>>>>>>>>>>>>>>>>>>>>
of them wrt.
>>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>>>>>
as I believe most optimizations
>>>>>>>>>>>>>>>>>>>>
would apply to any of them, potentially for
>>>>>>>>>>>>>>>>>>>>
different reasons and
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>
different gains, but nevertheless.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
A1: Lifetime is a concept that talks about memory
>>>>>>>>>>>>>>>>>>>>
content *only*.
>>>>>>>>>>>>>>>>>>>>
Basically, the memory content is set to
>>>>>>>>>>>>>>>>>>>>
         undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>
just an "as-is" property of the memory
>>>>>>>>>>>>>>>>>>>>
         content. The lifetimes of an object, as
>>>>>>>>>>>>>>>>>>>>
described by
>>>>>>>>>>>>>> markers,
>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>
change during the compilation. They
>>>>>>>>>>>>>>>>>>>>
         might become smaller if we deduce the
>>>>>>>>>>>>>>>>>>>>
object is not accessed
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>
the memory content is not used, they
>>>>>>>>>>>>>>>>>>>>
         might become larger if objects with
>>>>>>>>>>>>>>>>>>>>
non-overlapping
>>>>>>>>>>>>>> lifetimes
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>>>>>
         introducing a new object though.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
A2: If we define lifetime as above, it has nothing
>>>>>>>>>>>>>>>>>>>>
to do with the
>>>>>>>>>>>>>>>>>>>>
address of an object. Consequently, pointer
>>>>>>>>>>>>>>>>>>>>
         comparisons and similar operations are
>>>>>>>>>>>>>>>>>>>>
valid outside the
>>>>>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>>>>>>
Loads and stores are as well, they can
>>>>>>>>>>>>>>>>>>>>
         even not be removed "easily". A store
>>>>>>>>>>>>>>>>>>>>
followed by a lifetime
>>>>>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>>>>>>
or a load following a lifetime marker
>>>>>>>>>>>>>>>>>>>>
         is dead or results in undef respectively.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
A3: We could not use lifetime to argue about
>>>>>>>>>>>>>>>>>>>>
addresses. This means
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>>>>
         lifetimes result in different addresses.
>>>>>>>>>>>>>>>>>>>>
Thus, a comparison
>>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>
the address of two allocas could not
>>>>>>>>>>>>>>>>>>>>
         immediately be folded. That said, there
>>>>>>>>>>>>>>>>>>>>
would be special
>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>>
though. Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>>>>
         not escape, other than the comparisons to
>>>>>>>>>>>>>>>>>>>>
be folded, we can
>>>>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>>>>>
them. Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>>>>
         would still be consistent because it is
>>>>>>>>>>>>>>>>>>>>
unobservable. The
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>>>
we have in-tree is that we fold even though
>>>>>>>>>>>>>>>>>>>>
         the address is still observed (after the
>>>>>>>>>>>>>>>>>>>>
fold). It is still
>>>>>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>>>>>>
to me what the impact of this would be
>>>>>>>>>>>>>>>>>>>>
         on real code. I suggested before that we
>>>>>>>>>>>>>>>>>>>>
run some
>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>
before we make any decision whatsoever.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
This is pretty much saying that lifetime markers are
>>>>>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>
you suggested before (I think).
>>>>>>>>>>>>>>>>>>>>
There are some implementation level differences but
>>>>>>>>>>>>>>>>>>>>
at the end of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>
day they are basically the same.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Happy to hear some thoughts on this, especially if
>>>>>>>>>>>>>>>>>>>>
it fixes the
>>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>>>
that lead to D93376 in the first place.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~ Johannes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>>>>>
Hello all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of
>>>>>>>>>>>>>>>>>>>>>
lifetime.start/end
>>>>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>>>>>
here (
>>>>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>>>>> (syntactically
>>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
1. Is there a frontend or framework that
>>>>>>>>>>>>>>>>>>>>>
introduces lifetime call
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>>>>>>>>>>>>>>>>>>>>>
If exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
2. Can a block's lifetime be started bytewise or
>>>>>>>>>>>>>>>>>>>>>
elementwise?
>>>>>>>>>>>>>>>>>>>>>
I imagine an optimization that allow using stack
>>>>>>>>>>>>>>>>>>>>>
very compactly,
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>
wonder
>>>>>>>>>>>>>>>>>>>>>
there is a real-world use case.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Ralf Jung via llvm-dev

2021-Feb-08 18:37 UTC

head link

[llvm-dev] lifetime.start/end

Hi Johannes,
>>> I still believe folding can be done regardless of their lifetime,
as long as
>>> it is done consistent. Fold all pointer observations and the actual
>>> placement is not observable, we went down that road.
>>
>> Sure, but that's the kind of exception that always applies when
discussing
>> optimizations.
>> But to make my statement more precise: you cannot fold comparison of
*escaped*
>> pointers to false unless the pointers are *definitely not equal*, e.g.
if
>> their lifetimes overlap.
>>
> The interesting part is, lifetimes change. I continue to argue you cannot
fold
> if both pointers have escaped. If you do, and other things happen that
fiddle
> with the lifetime, you might accidentally coalesce them (explicitly or
implicitly).
Lifetimes can only change if that does not add new observable behavior. For 
example, of two lifetimes overlap in the original program, then shrinking then 
is not allowed since that would make previously-definitely-inequal pointers now 
be potentially equal.

To give an example (using C syntax but I am thinking of LLVM semantics here):

int *x = (int*)malloc(4);
int *y = (int*)malloc(4);
bool eq = (x == y);
free(x);
free(y);
print(eq);

Here, moving the "free(x)" up to before the "int *y" would
be incorrect since
after that transformation, "eq" might be "true", which
previously it never was.
>> My understanding is that lifetime markers are *explicitly intended to
support
>> coalescing* by permitting later phases to put allocas with disjoint
lifetimes
>> into the same stack slot. D93376 intended to keep it that way;
fundamentally
>> changing what lifetime markers are about by making them only about the
content
>> and not the address of the alloca seemed like a much more invasive
change.
>> Am I misunderstanding something?
> 
> I do think they were designed for that, yes. And I do agree my change looks
more
> invasive from that perspective.
> 
> Now does D93376 help with the problems we have is my question. Or, asked 
> differently, what is a way out of this which improves the big picture, and
what
> is practical. For the first, limiting it to allocas seems to be the wrong
way.
> So far, the argument seems to be that "we only use it for them",
but I'm not
> sure why a use for other pointers would be somehow bad 
> (https://godbolt.org/z/G8obj3). Second, limiting it to syntactic constructs
is
> the opposite of practical. It has been said before, syntactic constraints
are
> brittle. Beyond that, the syntactic def-use chain requirement has not even
come
> up in any of this mails, at least as far as I understand it. It still is
unclear
> to me how that would help.
The restriction to alloca arises (I think) entirely from the fact that some 
syntactic restriction is believed to be needed if the lifetime markers are 
supposed to be able to affect whether allocations (that have their addresses 
observed) may overlap in memory. For malloc, a syntactic restriction clearly 
makes no sense, so if a syntactic restriction is needed then only allowing 
alloca makes that much more feasible.

The "proper" way to do this would be to decorate the *alloca* with
annotations
saying for which parts of the function the allocation has to be live, and to use
that to define which alloca may observably overlap in memory and which may not. 
This would have a rather clear operational semantics. But sadly that's not
how
the lifetime markers have been designed, so the syntactic restriction is a way 
to "recover" that kind of a design: if each lifetime marker can be
clearly
associated to some alloca, then it's "as if" that marker was just
an annotation
at the alloca, and we can proceed as above.

(I do understand that these annotations would probably be not very practical, 
and one can view the markers as a convenient way for the frontend to convey 
those annotations to LLVM. But if that's what they are then they should be 
treated like that -- possibly by having some pre-processing pass that translates
them to annotations and removes the markers, so later optimizations don't
have
to worry about this any more. Maybe this would be a possible fallback plan if 
your suggestion does not pan out.)
> That all said, let's see what we need:
> 1) A way to ensure we don't create inconsistent situations wrt.
allocation
> addresses where there were none.
> 2) A way to ensure we can reasonably coalesce allocas to minimize stack
usage.
First and foremost we need a precise specification for what the operational 
behavior of any given LLVM IR program is -- and D93376 helps with that by 
proposing a clear operational behavior for the lifetime markers.

That was the original goal, anyway. (Or so I think -- I am not an author of that
change proposal.) Since then the scope broadened once you pointed out that 
comparison folding is still tricky even after we have a precise semantics for 
allocation.
> For one, I feel we converged pretty much on the fact that pointer
comparison
> folding is a problem. In the example below we have lifetime markers and we
use
> them to argue about lifetimes of the allocations, but we still agree the
folding
> should not happen. Once the folding is restricted to pointer pairs for
which at
> least one doesn't have it's address observed, I would like to
revisit 1). So
> far, I believe it would be resolved. For 2), we could stick with what we
have
> now, or adapt something new afterwards. I guess the existing wording needs
to be
> cleared up either way.
What we have now (in terms of specification for the lifetime markers) is, as far
as I am concerned, a rather vague sketch that should be turned into something 
more precise and operational. D93376 helps fix that, but it turns out (I guess 
unsurprisingly) that there is disagreement about what the more precise semantics
even should be.
>> Note that we do not need to talk about coalescing here, we are just
exploiting
>> the fact that allocation is non-deterministic and that allocations with
>> non-overlapping lifetime can be in the same spot or in different spots.
> 
> Sure. When I refer to coalescing I was also including "implicit
coalescing",
> e.g., as happened in my outlining example.
Oh I see. The question of "is this an allowed program transformation"
and "is
this an allowed execution" are closely related of course, but I still
usually
think of them very differently, in particular when there is lots of 
non-determinism as in our case.

After all, there's a difference between "this comparison might yield
false at
runtime" and "the compiler can replace this comparison by false"
-- as we just
discussed.
> I am basically struggling to understand why we need to introduce syntactic 
> restrictions on lifetime markers and why they need to be limited to
allocas.
> Those are two concepts included in D93376, right?
I gave my view on this above.
>> I agree that making the markers just erase the contents of the alloca
is the
>> simplest proposal on the table. *If* we truly can make LLVM follow that
model,
>> that would be very nice.
>> I would just be very surprised if it was possible to get this proposal
past
>> the people that will carefully watch over the existing optimization
potential
>> LLVM has, and make sure it does not become smaller. ;)  Maybe I am too 
>> pessimistic, but I would have never dared even making such a proposal
as I
>> would expect it to be shot down. I'll be happy to be proven wrong
about this. :)
> 
> I've made a few actually, `mustprogress` was the latest and you were
involved
> ;). It does prevent certain optimizations we did before unconditionally but
it
> also fixes a conceptual problem.
That's fair! And thanks for that btw. :)
> At least the pointer inconsistency is a 
> conceptual problem that needs fixing. What we do with lifetime markers is 
> probably orthogonal.
I feel like there's also a conceptual problem with lifetime markers, because
their spec is so vague that it becomes hard to say in general what their 
observable effect on program behavior is. Fundamentally, that's what I am
most
interested in fixing.
I am not sure to what extend I can help with the pointer inconsistency -- the 
spec is fairly clear there it seems, so if that's a transformation LLVM is 
currently doing, then it probably just has to stop doing that?

Kind regards,
Ralf
> 
> ~ Johannes
> 
> 
>>
>> Kind regards,
>> Ralf
>>
>>>
>>> ~ Johannes
>>>
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Ralf
>>>>
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>>
>>>>>> Kind regards,
>>>>>> Ralf
>>>>>>
>>>>>>> A lot of my arguments around alloca/malloc/... is
that we can augment the
>>>>>>> information from the frontend by determining if an
address is observed or
>>>>>>> not. If it is not, we can limit its allocation,
coalesce it, etc.
>>>>>>> Lifetime markers can help us do this but they are
not the only way.
>>>>>>> So, basically, we do not need lifetime markers to
coalesce
>>>>>>> alloca/malloc/... but that they can help.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Let's try to get out of this rabbit hole. If
you have a problem with the
>>>>>>> proposal as outlined in 
>>>>>>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html ,
>>>>>>> please let me know. I assume dealing with observed
addresses different
>>>>>>> than with non-observede ones (implied by noescape)
is the way forward now.
>>>>>>>
>>>>>>> ~ Johannes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Ralf
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>> Ralf
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Does that make some sense?
>>>>>>>>>>>
>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>> Ralf
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/7/21 10:12 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>> On Fri, Jan 8, 2021 at
8:54 AM Johannes Doerfert
>>>>>>>>>>>>>> <johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/7/21 4:55 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>>> It is, alloca
must have its integral address decided at its
>>>>>>>>>>>>>>>> *allocation*
>>>>>>>>>>>>>>>> time as well,
like 0xFF00... .
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For example, if
0xFFFF0000 ~ 0xFF000000 is already allocated,
>>>>>>>>>>>>>>>> the new
>>>>>>>>>>>>>>>> alloca cannot
be placed there unless it is not observed in the
>>>>>>>>>>>>>>>> future,
>>>>>>>>>>>>>>>> according to
your proposal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But how do we
know from the current state whether the stack
>>>>>>>>>>>>>>>> variable is
>>>>>>>>>>>>>>>> going to be
observed or not?
>>>>>>>>>>>>>>> With that argument
you could allocate all but 4 bytes, do a
>>>>>>>>>>>>>>> malloc(4)
>>>>>>>>>>>>>>> and know the
address of the returned pointer (assuming it is not
>>>>>>>>>>>>>>> null).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I try to say
is, either your scenario is part of the model and
>>>>>>>>>>>>>>> everything we do is
broken as you could "observe" addresses
>>>>>>>>>>>>>>> passively,
>>>>>>>>>>>>>>> *or*, the abstract
machine we use for semantic reasoning doesn't
>>>>>>>>>>>>>>> permit
>>>>>>>>>>>>>>> the above
reasoning. I really hope for the latter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's a very valid
point..!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Actually, I have a
memory model that addresses this problem.
>>>>>>>>>>>>>> The gist is that we can
interpret each allocation instruction as
>>>>>>>>>>>>>> *creating
>>>>>>>>>>>>>> 2 blocks* and
nondeterministically returning one of them.
>>>>>>>>>>>>>> The other one is marked
as invalid but not freed immediately.
>>>>>>>>>>>>>> Deallocation
>>>>>>>>>>>>>> frees both blocks.
>>>>>>>>>>>>>> If there is not enough
space for having two blocks, it is treated as
>>>>>>>>>>>>>> out-of-memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It makes guessing the
address of an allocation according to memory
>>>>>>>>>>>>>> layout
>>>>>>>>>>>>>> invalid UB.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> p = malloc(4)
>>>>>>>>>>>>>> // If p != null, it is
guaranteed there was at least two 4 byte slots
>>>>>>>>>>>>>> available, say
0x100~0x104 and 0x110~0x114.
>>>>>>>>>>>>>> // Two blocks are
allocated at 0x110 and 0x114, and one of the
>>>>>>>>>>>>>> addresses is
>>>>>>>>>>>>>> nondeterministically
returned.
>>>>>>>>>>>>>> // By the nature of
nondeterminism, all executions below should be
>>>>>>>>>>>>>> well-defined regardless
of the address of p.
>>>>>>>>>>>>>> *(int*)0x100 = 10; //
In an execution where p is 0x110, this
>>>>>>>>>>>>>> raises UB.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nice! Thanks for the link
:)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> To argue differently:
Who is to say there is a stack, or only one,
>>>>>>>>>>>>>>> or that alloca
allocates memory "on the one stack"? That is not
>>>>>>>>>>>>>>> part of
>>>>>>>>>>>>>>> the IR, IMHO. I can
write a machine on which alloca lowers to
>>>>>>>>>>>>>>> malloc,
>>>>>>>>>>>>>>> I don't even
need the free during stack unwind but that I could
>>>>>>>>>>>>>>> do as
>>>>>>>>>>>>>>> well if I wanted
to.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is right, the
example was for illustrative purposes. IR does not
>>>>>>>>>>>>>> enforce alloca to be
placed at 'stack'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Making
icmp/ptrtoint yield poison will still make loop
>>>>>>>>>>>>>>>> versioning or
>>>>>>>>>>>>>>>> pointer
rewriting transformations unsound because these
>>>>>>>>>>>>>>>> operations now
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> create poison
(even if pointers are noundef).
>>>>>>>>>>>>>>> I did not say they
yield poison, at least I did not try to say that.
>>>>>>>>>>>>>>> What are you
referring to exactly?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was referring to
this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions, which
>>>>>>>>>>>>>>>> contradicts
with what LLVM does.
>>>>>>>>>>>>>>> As with other
violation of attributes I would, on first though,
>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>> to produce poison,
not UB.
>>>>>>>>>>>>>> But it is more about
the (imaginary) attribute, so maybe I was
>>>>>>>>>>>>>> slightly out
>>>>>>>>>>>>>> of topic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) is fine, I think
the suggestion semantically makes sense
>>>>>>>>>>>>>>> perfectly. 1)
>>>>>>>>>>>>>>>> is something
I'm concerned about now.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There are more
than pointer foldings, such as rewriting such
>>>>>>>>>>>>>>>> expression,
>>>>>>>>>>>>>>>> code motion ptr
cmp, introduce ptr cmp, etc. There is also analysis
>>>>>>>>>>>>>>> relying
>>>>>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>>>>>> Defining the
correctness of each of them is something we want to
>>>>>>>>>>>>>>>> avoid,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> maybe
that's why we want to define precise semantics for things.
>>>>>>>>>>>>>>> I don't get the
point. My proposal does not change the semantics of
>>>>>>>>>>>>>>> pointer
comparisons, at all. I explicitly mentioned that in the last
>>>>>>>>>>>>>>> email.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oh okay, I thought it
was a part of the lifetime proposal, but it
>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>> more like a separate
thing.
>>>>>>>>>>>>>> I agree that this
requires performance impact.
>>>>>>>>>>>>>> Also investigation of
existing transformations would be needed;
>>>>>>>>>>>>>> Alive2's
>>>>>>>>>>>>>> pointer comparison is
doing approximation yet, but if it becomes
>>>>>>>>>>>>>> fully
>>>>>>>>>>>>>> precise, it will show
something from running LLVM unit tests I
>>>>>>>>>>>>>> believe..! :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think this will
be less aggressive and may give nice feedback to
>>>>>>>>>>>>>>>> potential
projects that are using lifetime with non-alloca.
>>>>>>>>>>>>>>> The lifetime marker
debate, basically 2) above, is orthogonal to the
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>> you try to solve.
It got mixed in as lifetime markers were used by
>>>>>>>>>>>>>>> StackColoring
>>>>>>>>>>>>>>> to perform
coalescing but that is coincidental. You can (arguably)
>>>>>>>>>>>>>>> coalesce stack
>>>>>>>>>>>>>>> allocations
regardless of lifetime markers and with 1) such a
>>>>>>>>>>>>>>> transformation
>>>>>>>>>>>>>>> (w/ and w/o
lifetime markers) would actually be sound.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> At the end, we
can implement IR writer that lowers lifetime with
>>>>>>>>>>>>>>> non-alloca
>>>>>>>>>>>>>>>> into
memset(undef). WDYT?
>>>>>>>>>>>>>>> Yeah, 2) is
orthogonal and we can lower it that way. Unsure if it is
>>>>>>>>>>>>>>> helpful
>>>>>>>>>>>>>>> but we can
certainly define it that way in the LangRef.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Okay, thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> p.s. The reply
was late, sorry. I think I can spend more time on
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> today.
>>>>>>>>>>>>>>>> On Thu, Jan 7,
2021 at 9:02 AM Johannes Doerfert <
>>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/6/21
4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is a
>>>>>>>>>>>>>>>>>>>>
desirable
>>>>>>>>>>>>>>>>>>
direction
>>>>>>>>>>>>>>>>>>>>
IMO.
>>>>>>>>>>>>>>>>>>>
I'm confused. What in the proposal would prevent us from
>>>>>>>>>>>>>>>>>>>
defining
>>>>>>>>>>>>>>>>>>> the
semantics of instructions, or force us to do it in an
>>>>>>>>>>>>>>>>>>>
"undesirable
>>>>>>>>>>>>>>>>>>
way"?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I meant
it would be great if the output state after executing an
>>>>>>>>>>>>>>>>>>
instruction can be described using its input state.
>>>>>>>>>>>>>>>>>> (that
was the meaning of 'stepwise semantics', I should have
>>>>>>>>>>>>>>>>>> been
more
>>>>>>>>>>>>>>>>>> clear
about this)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For
example, the semantics of 'z = add x y' can be defined as
>>>>>>>>>>>>>>>>>>
follows:
>>>>>>>>>>>>>>>>>> Given
an input state s, next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>>>>>>> where
s(x) is the value of x in the previous state, and s[z ->
>>>>>>>>>>>>>>>>>> v] is a
>>>>>>>>>>>>>>>>>> state
with z updated to v.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Another
example that involves memory: the semantics of 'i =
>>>>>>>>>>>>>>>>>>
ptrtoint p'
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> be
defined as follows:
>>>>>>>>>>>>>>>>>> Given
an input state s, next state s' = s[i -> s(p).obj.address +
>>>>>>>>>>>>>>>>>>
s(p).offset]
>>>>>>>>>>>>>>>>>> where
obj.address is the begin address of a memory object obj
>>>>>>>>>>>>>>>>>> pointed
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>>>>> &
offset is p's byte offset. (Imagine a pointer to the offset
>>>>>>>>>>>>>>>>>> of some
>>>>>>>>>>>>>>>>> char
>>>>>>>>>>>>>>>>>> array).
>>>>>>>>>>>>>>>>>> Note
that ptrtoint & add can be nicely defined w.r.t the input
>>>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Now,
the instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>>>>>>> To
describe the output state after executing 'p = alloca', the
>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> new
alloca should be determined.
>>>>>>>>>>>>>>>>>> If
observedness is involved, we need to know the future state
>>>>>>>>>>>>>>>>>> again.
:/
>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>
don't know whether the alloca is going to be observed or not
>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>>>>>> the
future.
>>>>>>>>>>>>>>>>>> This is
the problem of the current lifetime intrinsics as well.
>>>>>>>>>>>>>>>>> No, you mix
things up here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Nobody
proposed to modify the semantics of `alloca`.
>>>>>>>>>>>>>>>>> `alloca`
provides you with a fresh, unobserved block of
>>>>>>>>>>>>>>>>>
dereferenceable memory that is implicitly freed as the stack
>>>>>>>>>>>>>>>>> unwinds.
That is it. No context necessary.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you want
to modify the IR, you need to argue the observable
>>>>>>>>>>>>>>>>> semantics
which is nothing new. That this might require more than
>>>>>>>>>>>>>>>>> a peephole
view of the program is also not new.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> One
possible approach to resolve this is adding an
>>>>>>>>>>>>>>>>>>
'unobserved' flag to
>>>>>>>>>>>>>>>>>> alloca
instruction (similar to what was suggested by Nicolai).
>>>>>>>>>>>>>>>>>> And, we
can say that if alloca with 'unobserved' is used by
>>>>>>>>>>>>>>>>>
ptrtoint/icmp,
>>>>>>>>>>>>>>>>>> it is
UB.
>>>>>>>>>>>>>>>>> The flag
can be added, like we add other attributes. It should not
>>>>>>>>>>>>>>>>> be required
for any optimization we talked about though. It
>>>>>>>>>>>>>>>>> basically
>>>>>>>>>>>>>>>>> is a way to
manifest derived or given information into the IR.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Attribute
deduction, as well as frontends with domain knowledge,
>>>>>>>>>>>>>>>>> can add
such information. The flag we discussed in phab was not
>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>> sufficient
for all the transformation examples I presented in
>>>>>>>>>>>>>>>>> my mail,
>>>>>>>>>>>>>>>>> that is why
I extended my  argument. We could still have a
>>>>>>>>>>>>>>>>>
"noescape"
>>>>>>>>>>>>>>>>> flag for
allocas, but I'm not sure how useful that really is.
>>>>>>>>>>>>>>>>> We can
>>>>>>>>>>>>>>>>> certainly
deduce it and manifest it, unsure if we have domain
>>>>>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>>>>>> we can use
for non-trivial cases though.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> But
this makes ptrtoint/icmp make UB-raising instructions, which
>>>>>>>>>>>>>>>>>>
contradicts with what LLVM does.
>>>>>>>>>>>>>>>>> As with
other violation of attributes I would, on first though,
>>>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>>>> to produce
poison, not UB.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also,
existing optimizations like loop versioning can introduce
>>>>>>>>>>>>>>>>>>
ptrtoint/pointer comparisons too.
>>>>>>>>>>>>>>>>> Sure. I am
not certain why that is a problem. I get the feeling
>>>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>>>> are still
mixed up here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What I
proposed is twofold:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) We stop
folding comparisons between different allocas if
>>>>>>>>>>>>>>>>> changing
the
>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>       of
both might be observable. Thus, if both might have
>>>>>>>>>>>>>>>>> their
address
>>>>>>>>>>>>>>>>>
"taken"/escaped,
>>>>>>>>>>>>>>>>>       other
than the comparisons we want to fold, we cannot
>>>>>>>>>>>>>>>>> proceed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) We
define lifetime markers to mean `memset(undef)`.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The first
should be sufficient for the problem you were trying
>>>>>>>>>>>>>>>>> to solve
>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>> first
place. The second makes lifetime markers less weird. Note
>>>>>>>>>>>>>>>>> that 1)
>>>>>>>>>>>>>>>>> is not
changing
>>>>>>>>>>>>>>>>> the
semantics of the IR. We basically just argue there is a bug
>>>>>>>>>>>>>>>>> in our
>>>>>>>>>>>>>>>>> instcombine
right
>>>>>>>>>>>>>>>>> now as we
do not check all necessary preconditions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I see
that there are other questions that I didn't answer yet,
>>>>>>>>>>>>>>>>>> but let
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>> answer
this first to limit the length of the text :)
>>>>>>>>>>>>>>>>> Sure, we
can split the discussion :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu,
Jan 7, 2021 at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
1/5/21 8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>>>
Hi Johannes,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I read your proposal and thought about the model.
>>>>>>>>>>>>>>>>>>>
Cool, thanks!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
As you concerned in A3, certain programs may be valid only when
>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>>>>
blocks with overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>>>>>>>
Look at this example (I'm using malloc, but alloca also works):
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
p1 = malloc(4)
>>>>>>>>>>>>>>>>>>>>
p2 = malloc(4) // for brevity, assume that there as enough
>>>>>>>>>>>>>>>>>>>>
space & p1
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>
p2 != null
>>>>>>>>>>>>>>>>>>>>
set<char*> s;
>>>>>>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert did
>>>>>>>>>>>>>>>>>>>>
nothing, it
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>
be surprise to programmers
>>>>>>>>>>>>>>>>>>>>
work(s);
>>>>>>>>>>>>>>>>>>>>
free(data1)
>>>>>>>>>>>>>>>>>>>>
free(data2)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Clearly, IR semantics should guarantee that escaped blocks are
>>>>>>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>
would be great for verification tools on LLVM IR to be able
>>>>>>>>>>>>>>>>>>>>
to answer
>>>>>>>>>>>>>>>>>>>
that
>>>>>>>>>>>>>>>>>>>>
the second insert will succeed.
>>>>>>>>>>>>>>>>>>> I
agree, the second insert should succeed, assuming `p1 && p2`.
>>>>>>>>>>>>>>>>>>> I
don't think my proposal would in any way impact the program
>>>>>>>>>>>>>>>>>>>
above,
>>>>>>>>>>>>>>>>>>> if
anything the A3 reasoning makes sure such a program with
>>>>>>>>>>>>>>>>>>>
allocas
>>>>>>>>>>>>>>>>>>> is
not miscompiled.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
I'm also not sure I understand what you try to argue for. Maybe
>>>>>>>>>>>>>>>>>>>
elaborate a bit what it is you think is bad or needs to be
>>>>>>>>>>>>>>>>>>>
changed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The problem is that definition of escapedness is not clear
>>>>>>>>>>>>>>>>>>>>
at the
>>>>>>>>>>>>>>>>>>>
semantic
>>>>>>>>>>>>>>>>>>>>
level. Describing the IR semantics w.r.t. LLVM's escaped
>>>>>>>>>>>>>>>>>>>>
analysis
>>>>>>>>>>>>>>>>>>>
function
>>>>>>>>>>>>>>>>>>>>
isn't something we want.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Semantic definition of escapedness of a pointer seems hard,
>>>>>>>>>>>>>>>>>>>>
I mean
>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>
stepwise manner.
>>>>>>>>>>>>>>>>>>>>
It isn't a single instruction such as 'escape i8* ptr', and
>>>>>>>>>>>>>>>>>>>>
we need
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>
look
>>>>>>>>>>>>>>>>>>>>
over all instructions in the function. For example,
>>>>>>>>>>>>>>>>>>>>
'(int)(p+1) -
>>>>>>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>>>>>>>
isn't semantically escaping the pointer p because the result
>>>>>>>>>>>>>>>>>>>>
is 1
>>>>>>>>>>>>>>>>>>>>
regardless of the value of p.
>>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is a
>>>>>>>>>>>>>>>>>>>>
desirable
>>>>>>>>>>>>>>>>>>>
direction
>>>>>>>>>>>>>>>>>>>>
IMO.
>>>>>>>>>>>>>>>>>>>
I'm confused. What in the proposal would prevent us from
>>>>>>>>>>>>>>>>>>>
defining
>>>>>>>>>>>>>>>>>>> the
semantics of instructions, or force us to do it in an
>>>>>>>>>>>>>>>>>>>
"undesirable
>>>>>>>>>>>>>>>>>>>
way"?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
In practice, syntactically checking escapedness + nice
>>>>>>>>>>>>>>>>>>>>
engineering
>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>
not break optimizations in most cases (as undef/poison did);
>>>>>>>>>>>>>>>>>>>>
but it
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>
be great to move to another level, since LLVM IR is used in
>>>>>>>>>>>>>>>>>>>>
so many
>>>>>>>>>>>>>>>>>>>
places
>>>>>>>>>>>>>>>>>>>>
:)
>>>>>>>>>>>>>>>>>>> The
property under which you can coalesce objects is simple:
>>>>>>>>>>>>>>>>>>>
      It is not observable.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Now, if the address of one of the two objects you coalesce is
>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>
observed, coalescing is not observable. That is a sufficient
>>>>>>>>>>>>>>>>>>>
condition
>>>>>>>>>>>>>>>>>>> not
a necessary one. Pointer "escaping" is one step further.
>>>>>>>>>>>>>>>>>>> If
the
>>>>>>>>>>>>>>>>>>>
address doesn't escape it is not observed. This does not mean
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>
"semantic conditions for coalescing", e.g., for the purpose of
>>>>>>>>>>>>>>>>> translation
>>>>>>>>>>>>>>>>>>>
validation, is supposed to be build on top of our "definition of
>>>>>>>>>>>>>>>>> escaping
>>>>>>>>>>>>>>>>>>>
pointers". That said, we use "does not escape" as a
>>>>>>>>>>>>>>>>>>>
precondition for
>>>>>>>>>>>>>>>>>>>
various transformation and I'm unsure what is any different
>>>>>>>>>>>>>>>>>>>
now. The
>>>>>>>>>>>>>>>>>>>
entire escape argument is only used in the validity of the
>>>>>>>>>>>>>>>>>>>
pointer
>>>>>>>>>>>>>>>>> folding.
>>>>>>>>>>>>>>>>>>>
Similarly, we can fold a comparison of a noalias pointer with
>>>>>>>>>>>>>>>>>>>
another
>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> if
the former does not escape (and both are dereferenced and
>>>>>>>>>>>>>>>>>>> one
is
>>>>>>>>>>>>>>>>>>>
written for sure).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The pointer comparison is another beast. It indeed has a few
>>>>>>>>>>>>>>>>>>>>
issues,
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>
solving it might require nontrivial solution.
>>>>>>>>>>>>>>>>>>> I
think the main problem of the inconsistencies and such
>>>>>>>>>>>>>>>>>>>
we've seen is
>>>>>>>>>>>>>>>>>>>
rooted in the erroneous folding of pointer comparisons.
>>>>>>>>>>>>>>>>>>>
Cleaning up
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>
lifetime marker semantics is actually unrelated and simply
>>>>>>>>>>>>>>>>>>> not
folding
>>>>>>>>>>>>>>>>>>> as
described in A3 should solve the issue that has been
>>>>>>>>>>>>>>>>>>>
reported.
>>>>>>>>>>>>>>>>>>>
Nevertheless,
>>>>>>>>>>>>>>>>>>> we
should take a crack at lifetime markers while we are here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ~
Johannes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>>>>
wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Hi Juneyoung,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Happy new year :)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
After we had a lengthy discussion on phab last year, I've
>>>>>>>>>>>>>>>>>>>>>
tried to
>>>>>>>>>>>>>>>>>>>>>
summarize my thoughts,
>>>>>>>>>>>>>>>>>>>>>
especially given that I had some time to think about things
>>>>>>>>>>>>>>>>>>>>>
over the
>>>>>>>>>>>>>>>>>>>
break.
>>>>>>>>>>>>>>>>>>>>>
Still, no promises on the quality ;)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I start with general questions I asked myself and then go on
>>>>>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>>>>>>
about a potential design.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Q1: Is lifetime a given property or a derived one, thus is
>>>>>>>>>>>>>>>>>>>>>
it fixed
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>
modifiable?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
This is a question I asked myself a lot recently. I think
>>>>>>>>>>>>>>>>>>>>>
it is
>>>>>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>>>>>>
and modifiable,
>>>>>>>>>>>>>>>>>>>>>
at least I hope it is. Only that would allow
>>>>>>>>>>>>>>>>>>>>>
transformations I would
>>>>>>>>>>>>>>>>>>>
want
>>>>>>>>>>>>>>>>>>>>>
us to do. Here are some examples:
>>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Q2: Is a pointer comparison, or similar use, extending the
>>>>>>>>>>>>>>>>>>>>>
lifetime?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Asked differently, can we hoist a pointer comparison into a
>>>>>>>>>>>>>>>>>>>>>
region
>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>
the pointer is dead?
>>>>>>>>>>>>>>>>>>>>>
This is an important question which we haven't discussed
>>>>>>>>>>>>>>>>>>>>>
much as we
>>>>>>>>>>>>>>>>>>>>>
assumed LICM has to work.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
The current behavior is that non-dereferencing uses are not
>>>>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>>>>
the lifetime and are
>>>>>>>>>>>>>>>>>>>>>
allowed outside of "lifetime regions" (as indicated by
>>>>>>>>>>>>>>>>>>>>>
markers).
>>>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>>>>
will always produce valid
>>>>>>>>>>>>>>>>>>>>>
results. Though, we might want to think about a lifetime
>>>>>>>>>>>>>>>>>>>>>
marker that
>>>>>>>>>>>>>>>>>>>>>
spits out a new pointer
>>>>>>>>>>>>>>>>>>>>>
value instead of reusing the old one.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Q3: Can we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
The question here is, can we fold address comparisons based on
>>>>>>>>>>>>>>>>>>>>>
lifetimes, or not.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
So far, we fold comparisons based on "address information".
>>>>>>>>>>>>>>>>>>>>>
For
>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>>
we "know" globals,
>>>>>>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one another. Also, two
>>>>>>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>>>>>>>
allocations, for globals
>>>>>>>>>>>>>>>>>>>>>
and allocas, are considered unequal. Now, the crux is that
>>>>>>>>>>>>>>>>>>>>>
we have
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>
consistent if we do two
>>>>>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number
>>>>>>>>>>>>>>>>>>>>>
missing). Since
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>
backend (or any other place
>>>>>>>>>>>>>>>>>>>>>
for that matter) might coalesce allocas, their addresses
>>>>>>>>>>>>>>>>>>>>>
might not
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>
different after all. If we
>>>>>>>>>>>>>>>>>>>>>
already folded a comparison to "unequal" we are doomed if
>>>>>>>>>>>>>>>>>>>>>
we later
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>
a comparison that results
>>>>>>>>>>>>>>>>>>>>>
in "equal". (Note, this is different from aliasing rules as
>>>>>>>>>>>>>>>>>>>>>
they can
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>
stricter.)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Design:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I would hope we can come up with a model that treats memory
>>>>>>>>>>>>>>>>>>>>>
"the
>>>>>>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>>>>>>>
regardless if it is global,
>>>>>>>>>>>>>>>>>>>>>
stack, or heap. I want to avoid special casing one of them
>>>>>>>>>>>>>>>>>>>>>
wrt.
>>>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>>>>>>
as I believe most optimizations
>>>>>>>>>>>>>>>>>>>>>
would apply to any of them, potentially for different
>>>>>>>>>>>>>>>>>>>>>
reasons and
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>
different gains, but nevertheless.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
A1: Lifetime is a concept that talks about memory content
>>>>>>>>>>>>>>>>>>>>>
*only*.
>>>>>>>>>>>>>>>>>>>>>
Basically, the memory content is set to
>>>>>>>>>>>>>>>>>>>>>
         undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>
just an "as-is" property of the memory
>>>>>>>>>>>>>>>>>>>>>
         content. The lifetimes of an object, as described by
>>>>>>>>>>>>>>> markers,
>>>>>>>>>>>>>>>>>>>
might
>>>>>>>>>>>>>>>>>>>>>
change during the compilation. They
>>>>>>>>>>>>>>>>>>>>>
         might become smaller if we deduce the object is
>>>>>>>>>>>>>>>>>>>>>
not accessed
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>
the memory content is not used, they
>>>>>>>>>>>>>>>>>>>>>
         might become larger if objects with non-overlapping
>>>>>>>>>>>>>>> lifetimes
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>>>>>>
         introducing a new object though.)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
A2: If we define lifetime as above, it has nothing to do
>>>>>>>>>>>>>>>>>>>>>
with the
>>>>>>>>>>>>>>>>>>>>>
address of an object. Consequently, pointer
>>>>>>>>>>>>>>>>>>>>>
         comparisons and similar operations are valid
>>>>>>>>>>>>>>>>>>>>>
outside the
>>>>>>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>>>>>>>
Loads and stores are as well, they can
>>>>>>>>>>>>>>>>>>>>>
         even not be removed "easily". A store followed by
>>>>>>>>>>>>>>>>>>>>>
a lifetime
>>>>>>>>>>>>>>>>>>>
marker
>>>>>>>>>>>>>>>>>>>>>
or a load following a lifetime marker
>>>>>>>>>>>>>>>>>>>>>
         is dead or results in undef respectively.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
A3: We could not use lifetime to argue about addresses.
>>>>>>>>>>>>>>>>>>>>>
This means
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>>>>>
         lifetimes result in different addresses. Thus, a
>>>>>>>>>>>>>>>>>>>>>
comparison
>>>>>>>>>>>>>>>>>>>
between
>>>>>>>>>>>>>>>>>>>>>
the address of two allocas could not
>>>>>>>>>>>>>>>>>>>>>
         immediately be folded. That said, there would be
>>>>>>>>>>>>>>>>>>>>>
special
>>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>>>
though. Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>>>>>
         not escape, other than the comparisons to be
>>>>>>>>>>>>>>>>>>>>>
folded, we can
>>>>>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>>>>>>
them. Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>>>>>
         would still be consistent because it is
>>>>>>>>>>>>>>>>>>>>>
unobservable. The
>>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>>>>
we have in-tree is that we fold even though
>>>>>>>>>>>>>>>>>>>>>
         the address is still observed (after the fold). It
>>>>>>>>>>>>>>>>>>>>>
is still
>>>>>>>>>>>>>>>>>>>
unclear
>>>>>>>>>>>>>>>>>>>>>
to me what the impact of this would be
>>>>>>>>>>>>>>>>>>>>>
         on real code. I suggested before that we run some
>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>>>>
first
>>>>>>>>>>>>>>>>>>>>>
before we make any decision whatsoever.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
This is pretty much saying that lifetime markers are
>>>>>>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>
you suggested before (I think).
>>>>>>>>>>>>>>>>>>>>>
There are some implementation level differences but at the
>>>>>>>>>>>>>>>>>>>>>
end of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>
day they are basically the same.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Happy to hear some thoughts on this, especially if it fixes
>>>>>>>>>>>>>>>>>>>>>
the
>>>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>>>>
that lead to D93376 in the first place.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
~ Johannes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>>>>>>
Hello all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of lifetime.start/end
>>>>>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>>>>>>
here (
>>>>>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>>>>>> (syntactically
>>>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
1. Is there a frontend or framework that introduces
>>>>>>>>>>>>>>>>>>>>>>
lifetime call
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>>>>>>>>>>>>>>>>>>>>>>
If exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
2. Can a block's lifetime be started bytewise or elementwise?
>>>>>>>>>>>>>>>>>>>>>>
I imagine an optimization that allow using stack very
>>>>>>>>>>>>>>>>>>>>>>
compactly,
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>
wonder
>>>>>>>>>>>>>>>>>>>>>>
there is a real-world use case.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
-- 
Website: https://people.mpi-sws.org/~jung/

Juneyoung Lee via llvm-dev

2021-Feb-09 05:46 UTC

head link

[llvm-dev] lifetime.start/end

Hi,

About syntactic constraints - I think it can be helpful for transformations
that need to be fully aware of all allocations' lifetimes.
For example, AddressSanitizer is currently simply giving up instrumenting
things if there is an unknown lifetime usage.
If all lifetime intrinsics' pointers are guaranteed to be known
expressions, this issue is resolved.

if (HasUntracedLifetimeIntrinsic) {
// If there are lifetime intrinsics which couldn't be traced back to an
// alloca, we may not know exactly when a variable enters scope, and
// therefore should "fail safe" by not poisoning them.
StaticAllocaPoisonCallVec.clear();
DynamicAllocaPoisonCallVec.clear();
}

Interestingly, AddressSanitizer is the one that raises an issue as well if
the syntactic restriction is enforced.
It replaces allocas with __asan_stack_malloc_N, but still leaves its
lifetime uses, which breaks the syntactic restriction. A possible solution
for this is to simply remove the lifetime calls.

Juneyoung

On Mon, Feb 1, 2021 at 2:36 PM Johannes Doerfert <johannesdoerfert at
gmail.com>
wrote:
>
> On 1/30/21 7:40 AM, Ralf Jung wrote:
> > Hi Johannes,
> >
> >>>> I was opposing the proposal because of the brittle
syntactic
> >>>> restrictions. In my attempt to justify not having those I
also
> >>>> tried to resolve the above inconsistency in a more general
manner
> >>>> because I thought that was what it was all about. I
believe that we
> >>>> either have to argue the address of a dead object is
> >>>> unusable/undetermined, basically the direction of James
comment, or
> >>>> we need to decouple lifetime markers from locations.
> >>>
> >>> Folding to false and coalescing are inconsistent with each
other,
> >>> yes. It is just wrong to perform both of these on the above
program,
> >>> I would say (if we interpret it with LLVM semantics, which I
assume
> >>> has no "pointer zapping"). Is that up for debate?
> >>>
> >>> However, I do not see the immediate connection to lifetime
markers
> >>> -- the same would be true for a program using malloc/free.
> >>> Coalescing is permissible whenever lifetimes are disjoint.
Folding
> >>> to false is only permissible if the pointers are *definitely
not
> >>> equal*, e.g. if their lifetimes overlap.
> >>>
> >> I still believe folding can be done regardless of their lifetime,
as
> >> long as it is done consistent. Fold all pointer observations and
the
> >> actual
> >> placement is not observable, we went down that road.
> >
> > Sure, but that's the kind of exception that always applies when
> > discussing optimizations.
> > But to make my statement more precise: you cannot fold comparison of
> > *escaped* pointers to false unless the pointers are *definitely not
> > equal*, e.g. if their lifetimes overlap.
> >
> The interesting part is, lifetimes change. I continue to argue you
> cannot fold if both pointers have escaped. If you do, and other things
> happen that fiddle with the lifetime, you might accidentally coalesce
> them (explicitly or implicitly).
>
>
> >> I agree with the malloc/free comparison, I did argue that the
entire
> >> time, or I tried.
> >>
> >> If there is no/little connection to lifetime markers, I assume
> >> https://reviews.llvm.org/D93376 did not fix the inconsistency
then.
> >> It was,
> >> apparently, just an approach to clarify what a lifetime marker is,
> >> right? I might be more confused than before. I believe you should
equate
> >> lifetime markers and memset(undef) and put the entire discussion
> >> about folding/coalescing onto a foundation that is unrelated to
lifetime
> >> markers.
> >
> > My understanding is that lifetime markers are *explicitly intended to
> > support coalescing* by permitting later phases to put allocas with
> > disjoint lifetimes into the same stack slot. D93376 intended to keep
> > it that way; fundamentally changing what lifetime markers are about by
> > making them only about the content and not the address of the alloca
> > seemed like a much more invasive change.
> > Am I misunderstanding something?
>
> I do think they were designed for that, yes. And I do agree my change
> looks more invasive from that perspective.
>
> Now does D93376 help with the problems we have is my question. Or, asked
> differently, what is a way out of this which improves the big picture,
> and what is practical. For the first, limiting it to allocas seems to be
> the wrong way. So far, the argument seems to be that "we only use it
for
> them", but I'm not sure why a use for other pointers would be
somehow
> bad (https://godbolt.org/z/G8obj3). Second, limiting it to syntactic
> constructs is the opposite of practical. It has been said before,
> syntactic constraints are brittle. Beyond that, the syntactic def-use
> chain requirement has not even come up in any of this mails, at least as
> far as I understand it. It still is unclear to me how that would help.
>
> That all said, let's see what we need:
> 1) A way to ensure we don't create inconsistent situations wrt.
> allocation addresses where there were none.
> 2) A way to ensure we can reasonably coalesce allocas to minimize stack
> usage.
>
> For one, I feel we converged pretty much on the fact that pointer
> comparison folding is a problem. In the example below we have lifetime
> markers and we use them to argue about lifetimes of the allocations, but
> we still agree the folding should not happen. Once the folding is
> restricted to pointer pairs for which at least one doesn't have
it's
> address observed, I would like to revisit 1). So far, I believe it would
> be resolved. For 2), we could stick with what we have now, or adapt
> something new afterwards. I guess the existing wording needs to be
> cleared up either way.
>
> Long story short, I don't see how D93376 addresses what seems to be the
> problem.
>
>
>
> >
> >>> The syntactic condition also provides a clear path to evaluate
> >>> whether any given folding or coalescing is correct.
> >>
> >> I disagree. The syntactic conditions, at least as I read them,
allow
> >> the above program with lifetime markers (as we have it now) but
did not
> >> suggest it would be wrong to fold and later coalesce. Or maybe I
> >> missed why the syntactic restrictions would prevent that, please
> >> explain then.
> >
> > Where in that program would you place the lifetime markers? I assume
> > at the beginning and end of the local variable's scope?
> > So that would become something like (also reflecting that allocas are
> > all at the top):
> >
> > void foo() {
> >    intptr_t xp, yp;
> >    int x, y;
> >    { lifetime.start(&x); xp = &x; bar(&x);
lifetime.end(&x); }
> >    { lifetime.start(&y); yp = &y; bar(&y);
lifetime.end(&y); }
> >    print(xp == yp);               // can we fold this comparison to
> > false?
> > }
> >
> Yes, this is how clang would generate the code right now.
>
>
> > I think evaluating correctness of folding to false here is quite clear
> > (I assume this is just using C syntax to represent LLVM IR, so we
> > applyLLVM IR semantics):
>
> Agreed.
>
>
> > - The program as-written is non-deterministic; x and y have disjoint
> > lifetime so they could have the same or a different address.
>
> Agreed (for the C version for which the above is the LLVM-IR
> implementation).
>
>
> > - When the compiler decides to fold the comparison to false, it means
> > it has to commit to one of the possible non-det choices, and needs to
> > stick to that choice.
>
> Agreed.
>
>
> > So *just* replacing "xp == yp" by "false" is
incorrect in the sense
> > that the new program has a possible behavior that did not occur with
> > the old program: "true false" can now be printed (*if* the
allocator
> > chooses to put x and y into the same location), whereas the original
> > program would never print "true false".
>
> Agreed.
>
>
> > Note that we do not need to talk about coalescing here, we are just
> > exploiting the fact that allocation is non-deterministic and that
> > allocations with non-overlapping lifetime can be in the same spot or
> > in different spots.
>
> Sure. When I refer to coalescing I was also including "implicit
> coalescing", e.g., as happened in my outlining example.
>
>
> >
> > IOW, coalescing "x" and "y" in this program is
correct, but folding
> > the comparison to false is incorrect.
>
> Agreed, assuming we don't do anything to bar.
>
>
> >
> > So when you disagree with my statement that we have "a clear path
to
> > evaluate whether any given folding or coalescing is correct",
then
> > which of the above to you disagree with?
>
> None. I disagreed with the part "The syntactic condition also provides
a
> clear path ...", which implies this path was dependent on the
syntactic
> condition. I don't see you making an argument about syntactic def-use
> chains above.
>
> I am basically struggling to understand why we need to introduce
> syntactic restrictions on lifetime markers and why they need to be
> limited to allocas. Those are two concepts included in D93376, right?
>
>
> >
> > Folding to false would be correct if it would be accompanied by also
> > embedding into the program some hint which tells the allocator that
> > the two allocations must not be at the same spot in memory. This would
> > be a way to ensure that LLVM "sticks to its choice". But
that issue is
> > entirely orthogonal to the semantics of lifetime.start / lifetime.end;
> > it arises with allocation lifetimes of any kind, including
malloc/free.
> >
> > I think you are proposing to make them only about the contents of the
> > alloca, so in the above "foo", now the lifetime of (the
allocations
> > backing) "x" and "y" *would* overlap, so folding
to false (of escaped
> > pointers) is... still not allowed, but now for less subtle reasons.
> > But this doesn't *solve* the "fold to false" problem, it
just means
> > that the problem no longer applies to this particular program.
>
> My original email in this thread addressed the folding problem but I
> clarified since: Don't fold if both addresses are observed, regardless
> of any lifetime argument. You seem to mix the comment I made on phab
> last year with the email discussion. The phab conversation is outdated
> and replaced by this email thread.
>
>
> >
> > Or is your argument that, since "fold to false" is wrong in
both
> > cases, it doesn't really matter which choice we make?  However, in
> > your proposal, the option of doing the fold while also "adding
hints
> > for the allocator" no longer exists.
>
> My proposal comes in two parts: Fix the folding issue, then revisit
> markers. I'm not convinced that the folding issue can be fixed by
"hints
> for the allocator" but I believe the escape argument holds.
>
>
>
> >
> >> I don't think anyone wanted that. TBH, I think my proposal is
by far
> >> the most compatible one wrt. the "classic" model w/o
lifetime
> >> markers, given
> >> that I proposed not to entangle them in object allocation.
Everything
> >> else I proposed are "as-if" transformations. Do you
disagree?
> >
> > I agree that making the markers just erase the contents of the alloca
> > is the simplest proposal on the table. *If* we truly can make LLVM
> > follow that model, that would be very nice.
> > I would just be very surprised if it was possible to get this proposal
> > past the people that will carefully watch over the existing
> > optimization potential LLVM has, and make sure it does not become
> > smaller. ;)  Maybe I am too pessimistic, but I would have never dared
> > even making such a proposal as I would expect it to be shot down.
I'll
> > be happy to be proven wrong about this. :)
>
> I've made a few actually, `mustprogress` was the latest and you were
> involved ;). It does prevent certain optimizations we did before
> unconditionally but it also fixes a conceptual problem. At least the
> pointer inconsistency is a conceptual problem that needs fixing. What we
> do with lifetime markers is probably orthogonal.
>
> ~ Johannes
>
>
> >
> > Kind regards,
> > Ralf
> >
> >>
> >> ~ Johannes
> >>
> >>
> >>
> >>>
> >>> Kind regards,
> >>> Ralf
> >>>
> >>>>
> >>>> ~ Johannes
> >>>>
> >>>>
> >>>>>
> >>>>> Kind regards,
> >>>>> Ralf
> >>>>>
> >>>>>> A lot of my arguments around alloca/malloc/... is
that we can
> >>>>>> augment the information from the frontend by
determining if an
> >>>>>> address is observed or not. If it is not, we can
limit its
> >>>>>> allocation, coalesce it, etc. Lifetime markers can
help us do
> >>>>>> this but they are not the only way.
> >>>>>> So, basically, we do not need lifetime markers to
coalesce
> >>>>>> alloca/malloc/... but that they can help.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Let's try to get out of this rabbit hole. If
you have a problem
> >>>>>> with the proposal as outlined in
> >>>>>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
> >>>>>> , please let me know. I assume dealing with
observed addresses
> >>>>>> different than with non-observede ones (implied by
noescape) is
> >>>>>> the way forward now.
> >>>>>>
> >>>>>> ~ Johannes
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Kind regards,
> >>>>>>> Ralf
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Kind regards,
> >>>>>>>>> Ralf
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Does that make some sense?
> >>>>>>>>>>
> >>>>>>>>>> ~ Johannes
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Kind regards,
> >>>>>>>>>>> Ralf
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 1/7/21 10:12 PM,
Juneyoung Lee wrote:
> >>>>>>>>>>>>> On Fri, Jan 8, 2021 at
8:54 AM Johannes Doerfert
> >>>>>>>>>>>>> <johannesdoerfert
at gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 1/7/21 4:55 PM,
Juneyoung Lee wrote:
> >>>>>>>>>>>>>>> It is, alloca
must have its integral address decided at
> >>>>>>>>>>>>>>> its
*allocation*
> >>>>>>>>>>>>>>> time as well,
like 0xFF00... .
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> For example,
if 0xFFFF0000 ~ 0xFF000000 is already
> >>>>>>>>>>>>>>> allocated, the
new
> >>>>>>>>>>>>>>> alloca cannot
be placed there unless it is not observed
> >>>>>>>>>>>>>>> in the future,
> >>>>>>>>>>>>>>> according to
your proposal.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> But how do we
know from the current state whether the
> >>>>>>>>>>>>>>> stack variable
is
> >>>>>>>>>>>>>>> going to be
observed or not?
> >>>>>>>>>>>>>> With that argument
you could allocate all but 4 bytes, do
> >>>>>>>>>>>>>> a malloc(4)
> >>>>>>>>>>>>>> and know the
address of the returned pointer (assuming it
> >>>>>>>>>>>>>> is not null).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What I try to say
is, either your scenario is part of the
> >>>>>>>>>>>>>> model and
> >>>>>>>>>>>>>> everything we do
is broken as you could "observe"
> >>>>>>>>>>>>>> addresses
passively,
> >>>>>>>>>>>>>> *or*, the abstract
machine we use for semantic reasoning
> >>>>>>>>>>>>>> doesn't permit
> >>>>>>>>>>>>>> the above
reasoning. I really hope for the latter.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> That's a very
valid point..!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Actually, I have a
memory model that addresses this problem.
> >>>>>>>>>>>>> The gist is that we
can interpret each allocation
> >>>>>>>>>>>>> instruction as
*creating
> >>>>>>>>>>>>> 2 blocks* and
nondeterministically returning one of them.
> >>>>>>>>>>>>> The other one is
marked as invalid but not freed
> >>>>>>>>>>>>> immediately.
Deallocation
> >>>>>>>>>>>>> frees both blocks.
> >>>>>>>>>>>>> If there is not enough
space for having two blocks, it is
> >>>>>>>>>>>>> treated as
> >>>>>>>>>>>>> out-of-memory.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It makes guessing the
address of an allocation according
> >>>>>>>>>>>>> to memory layout
> >>>>>>>>>>>>> invalid UB.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> p = malloc(4)
> >>>>>>>>>>>>> // If p != null, it is
guaranteed there was at least two 4
> >>>>>>>>>>>>> byte slots
> >>>>>>>>>>>>> available, say
0x100~0x104 and 0x110~0x114.
> >>>>>>>>>>>>> // Two blocks are
allocated at 0x110 and 0x114, and one of
> >>>>>>>>>>>>> the addresses is
> >>>>>>>>>>>>> nondeterministically
returned.
> >>>>>>>>>>>>> // By the nature of
nondeterminism, all executions below
> >>>>>>>>>>>>> should be
> >>>>>>>>>>>>> well-defined
regardless of the address of p.
> >>>>>>>>>>>>> *(int*)0x100 = 10; //
In an execution where p is 0x110,
> >>>>>>>>>>>>> this raises UB.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The paper link is here
> >>>>>>>>>>>>>
<https://dl.acm.org/doi/10.1145/3276495> FYI.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Nice! Thanks for the link
:)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> To argue differently:
Who is to say there is a stack, or
> >>>>>>>>>>>>> only one,
> >>>>>>>>>>>>>> or that alloca
allocates memory "on the one stack"? That
> >>>>>>>>>>>>>> is not part of
> >>>>>>>>>>>>>> the IR, IMHO. I
can write a machine on which alloca
> >>>>>>>>>>>>>> lowers to malloc,
> >>>>>>>>>>>>>> I don't even
need the free during stack unwind but that I
> >>>>>>>>>>>>>> could do as
> >>>>>>>>>>>>>> well if I wanted
to.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is right, the
example was for illustrative purposes.
> >>>>>>>>>>>>> IR does not
> >>>>>>>>>>>>> enforce alloca to be
placed at 'stack'.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Making
icmp/ptrtoint yield poison will still make loop
> >>>>>>>>>>>>>>> versioning or
> >>>>>>>>>>>>>>> pointer
rewriting transformations unsound because these
> >>>>>>>>>>>>>>> operations now
> >>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> create poison
(even if pointers are noundef).
> >>>>>>>>>>>>>> I did not say they
yield poison, at least I did not try
> >>>>>>>>>>>>>> to say that.
> >>>>>>>>>>>>>> What are you
referring to exactly?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> I was referring to
this:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising
> >>>>>>>>>>>>>>> instructions,
which
> >>>>>>>>>>>>>>> contradicts
with what LLVM does.
> >>>>>>>>>>>>>> As with other
violation of attributes I would, on first
> >>>>>>>>>>>>>> though, suggest
> >>>>>>>>>>>>>> to produce poison,
not UB.
> >>>>>>>>>>>>> But it is more about
the (imaginary) attribute, so maybe I
> >>>>>>>>>>>>> was slightly out
> >>>>>>>>>>>>> of topic.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2) is fine, I
think the suggestion semantically makes
> >>>>>>>>>>>>>> sense perfectly.
1)
> >>>>>>>>>>>>>>> is something
I'm concerned about now.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There are more
than pointer foldings, such as rewriting
> >>>>>>>>>>>>>>> such
expression,
> >>>>>>>>>>>>>>> code motion
ptr cmp, introduce ptr cmp, etc. There is
> >>>>>>>>>>>>>>> also analysis
> >>>>>>>>>>>>>> relying
> >>>>>>>>>>>>>>> on ptr cmp.
> >>>>>>>>>>>>>>> Defining the
correctness of each of them is something we
> >>>>>>>>>>>>>>> want to avoid,
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> maybe
that's why we want to define precise semantics for
> >>>>>>>>>>>>>>> things.
> >>>>>>>>>>>>>> I don't get
the point. My proposal does not change the
> >>>>>>>>>>>>>> semantics of
> >>>>>>>>>>>>>> pointer
comparisons, at all. I explicitly mentioned that
> >>>>>>>>>>>>>> in the last
> >>>>>>>>>>>>>> email.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Oh okay, I thought it
was a part of the lifetime proposal,
> >>>>>>>>>>>>> but it seems
> >>>>>>>>>>>>> more like a separate
thing.
> >>>>>>>>>>>>> I agree that this
requires performance impact.
> >>>>>>>>>>>>> Also investigation of
existing transformations would be
> >>>>>>>>>>>>> needed; Alive2's
> >>>>>>>>>>>>> pointer comparison is
doing approximation yet, but if it
> >>>>>>>>>>>>> becomes fully
> >>>>>>>>>>>>> precise, it will show
something from running LLVM unit
> >>>>>>>>>>>>> tests I believe..! :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think this will
be less aggressive and may give nice
> >>>>>>>>>>>>>> feedback to
> >>>>>>>>>>>>>>> potential
projects that are using lifetime with non-alloca.
> >>>>>>>>>>>>>> The lifetime
marker debate, basically 2) above, is
> >>>>>>>>>>>>>> orthogonal to the
> >>>>>>>>>>>>>> problem
> >>>>>>>>>>>>>> you try to solve.
It got mixed in as lifetime markers
> >>>>>>>>>>>>>> were used by
> >>>>>>>>>>>>>> StackColoring
> >>>>>>>>>>>>>> to perform
coalescing but that is coincidental. You can
> >>>>>>>>>>>>>> (arguably)
> >>>>>>>>>>>>>> coalesce stack
> >>>>>>>>>>>>>> allocations
regardless of lifetime markers and with 1)
> >>>>>>>>>>>>>> such a
> >>>>>>>>>>>>>> transformation
> >>>>>>>>>>>>>> (w/ and w/o
lifetime markers) would actually be sound.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> At the end, we
can implement IR writer that lowers
> >>>>>>>>>>>>>>> lifetime with
> >>>>>>>>>>>>>> non-alloca
> >>>>>>>>>>>>>>> into
memset(undef). WDYT?
> >>>>>>>>>>>>>> Yeah, 2) is
orthogonal and we can lower it that way.
> >>>>>>>>>>>>>> Unsure if it is
> >>>>>>>>>>>>>> helpful
> >>>>>>>>>>>>>> but we can
certainly define it that way in the LangRef.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Okay, thanks!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> ~ Johannes
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Juneyoung
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> p.s. The reply
was late, sorry. I think I can spend more
> >>>>>>>>>>>>>>> time on this
> >>>>>>>>>>>>>> today.
> >>>>>>>>>>>>>>> On Thu, Jan 7,
2021 at 9:02 AM Johannes Doerfert <
> >>>>>>>>>>>>>> johannesdoerfert
at gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 1/6/21
4:33 PM, Juneyoung Lee wrote:
>
>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is
>
>>>>>>>>>>>>>>>>>>> a
desirable
> >>>>>>>>>>>>>>>>>
direction
>
>>>>>>>>>>>>>>>>>>>
IMO.
> >>>>>>>>>>>>>>>>>>
I'm confused. What in the proposal would prevent us
> >>>>>>>>>>>>>>>>>>
from defining
> >>>>>>>>>>>>>>>>>>
the semantics of instructions, or force us to do it
> >>>>>>>>>>>>>>>>>> in
an "undesirable
> >>>>>>>>>>>>>>>>>
way"?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I
meant it would be great if the output state after
> >>>>>>>>>>>>>>>>>
executing an
> >>>>>>>>>>>>>>>>>
instruction can be described using its input state.
> >>>>>>>>>>>>>>>>> (that
was the meaning of 'stepwise semantics', I
> >>>>>>>>>>>>>>>>> should
have been more
> >>>>>>>>>>>>>>>>> clear
about this)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For
example, the semantics of 'z = add x y' can be
> >>>>>>>>>>>>>>>>>
defined as follows:
> >>>>>>>>>>>>>>>>> Given
an input state s, next state s' = s[z -> s(x) +
> >>>>>>>>>>>>>>>>> s(y)]
> >>>>>>>>>>>>>>>>> where
s(x) is the value of x in the previous state,
> >>>>>>>>>>>>>>>>> and
s[z -> v] is a
> >>>>>>>>>>>>>>>>> state
with z updated to v.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
Another example that involves memory: the semantics of
> >>>>>>>>>>>>>>>>> 'i
= ptrtoint p'
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> be
defined as follows:
> >>>>>>>>>>>>>>>>> Given
an input state s, next state s' = s[i ->
> >>>>>>>>>>>>>>>>>
s(p).obj.address +
> >>>>>>>>>>>>>>>>>
s(p).offset]
> >>>>>>>>>>>>>>>>> where
obj.address is the begin address of a memory
> >>>>>>>>>>>>>>>>> object
obj pointed
> >>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>> p
> >>>>>>>>>>>>>>>>> &
offset is p's byte offset. (Imagine a pointer to the
> >>>>>>>>>>>>>>>>> offset
of some
> >>>>>>>>>>>>>>>> char
> >>>>>>>>>>>>>>>>>
array).
> >>>>>>>>>>>>>>>>> Note
that ptrtoint & add can be nicely defined w.r.t
> >>>>>>>>>>>>>>>>> the
input state.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Now,
the instruction that we're facing is 'p = alloca'.
> >>>>>>>>>>>>>>>>> To
describe the output state after executing 'p >
>>>>>>>>>>>>>>>>>
alloca', the address
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> new
alloca should be determined.
> >>>>>>>>>>>>>>>>> If
observedness is involved, we need to know the
> >>>>>>>>>>>>>>>>> future
state again. :/
> >>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>
don't know whether the alloca is going to be observed
> >>>>>>>>>>>>>>>>> or not
without
> >>>>>>>>>>>>>>>> seeing
> >>>>>>>>>>>>>>>>> the
future.
> >>>>>>>>>>>>>>>>> This
is the problem of the current lifetime intrinsics
> >>>>>>>>>>>>>>>>> as
well.
> >>>>>>>>>>>>>>>> No, you
mix things up here.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Nobody
proposed to modify the semantics of `alloca`.
> >>>>>>>>>>>>>>>> `alloca`
provides you with a fresh, unobserved block of
> >>>>>>>>>>>>>>>>
dereferenceable memory that is implicitly freed as the
> >>>>>>>>>>>>>>>> stack
> >>>>>>>>>>>>>>>> unwinds.
That is it. No context necessary.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If you
want to modify the IR, you need to argue the
> >>>>>>>>>>>>>>>> observable
> >>>>>>>>>>>>>>>> semantics
which is nothing new. That this might require
> >>>>>>>>>>>>>>>> more than
> >>>>>>>>>>>>>>>> a peephole
view of the program is also not new.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> One
possible approach to resolve this is adding an
> >>>>>>>>>>>>>>>>>
'unobserved' flag to
> >>>>>>>>>>>>>>>>> alloca
instruction (similar to what was suggested by
> >>>>>>>>>>>>>>>>>
Nicolai).
> >>>>>>>>>>>>>>>>> And,
we can say that if alloca with 'unobserved' is
> >>>>>>>>>>>>>>>>> used
by
> >>>>>>>>>>>>>>>>
ptrtoint/icmp,
> >>>>>>>>>>>>>>>>> it is
UB.
> >>>>>>>>>>>>>>>> The flag
can be added, like we add other attributes. It
> >>>>>>>>>>>>>>>> should not
> >>>>>>>>>>>>>>>> be
required for any optimization we talked about
> >>>>>>>>>>>>>>>> though. It
basically
> >>>>>>>>>>>>>>>> is a way
to manifest derived or given information into
> >>>>>>>>>>>>>>>> the IR.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Attribute
deduction, as well as frontends with domain
> >>>>>>>>>>>>>>>> knowledge,
> >>>>>>>>>>>>>>>> can add
such information. The flag we discussed in phab
> >>>>>>>>>>>>>>>> was not
even
> >>>>>>>>>>>>>>>> sufficient
for all the transformation examples I
> >>>>>>>>>>>>>>>> presented
in my mail,
> >>>>>>>>>>>>>>>> that is
why I extended my  argument. We could still
> >>>>>>>>>>>>>>>> have a
"noescape"
> >>>>>>>>>>>>>>>> flag for
allocas, but I'm not sure how useful that
> >>>>>>>>>>>>>>>> really is.
We can
> >>>>>>>>>>>>>>>> certainly
deduce it and manifest it, unsure if we have
> >>>>>>>>>>>>>>>> domain
knowledge
> >>>>>>>>>>>>>>>> we can use
for non-trivial cases though.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> But
this makes ptrtoint/icmp make UB-raising
> >>>>>>>>>>>>>>>>>
instructions, which
> >>>>>>>>>>>>>>>>>
contradicts with what LLVM does.
> >>>>>>>>>>>>>>>> As with
other violation of attributes I would, on first
> >>>>>>>>>>>>>>>> though,
suggest
> >>>>>>>>>>>>>>>> to produce
poison, not UB.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Also,
existing optimizations like loop versioning can
> >>>>>>>>>>>>>>>>>
introduce
> >>>>>>>>>>>>>>>>>
ptrtoint/pointer comparisons too.
> >>>>>>>>>>>>>>>> Sure. I am
not certain why that is a problem. I get the
> >>>>>>>>>>>>>>>> feeling
things
> >>>>>>>>>>>>>>>> are still
mixed up here.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> What I
proposed is twofold:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 1) We stop
folding comparisons between different
> >>>>>>>>>>>>>>>> allocas if
changing the
> >>>>>>>>>>>>>>>> address
> >>>>>>>>>>>>>>>>       of
both might be observable. Thus, if both might
> >>>>>>>>>>>>>>>> have their
address
> >>>>>>>>>>>>>>>>
"taken"/escaped,
> >>>>>>>>>>>>>>>>      
other than the comparisons we want to fold, we
> >>>>>>>>>>>>>>>> cannot
proceed.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2) We
define lifetime markers to mean `memset(undef)`.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The first
should be sufficient for the problem you were
> >>>>>>>>>>>>>>>> trying to
solve
> >>>>>>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>> first
place. The second makes lifetime markers less
> >>>>>>>>>>>>>>>> weird.
Note that 1)
> >>>>>>>>>>>>>>>> is not
changing
> >>>>>>>>>>>>>>>> the
semantics of the IR. We basically just argue there
> >>>>>>>>>>>>>>>> is a bug
in our
> >>>>>>>>>>>>>>>>
instcombine right
> >>>>>>>>>>>>>>>> now as we
do not check all necessary preconditions.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I see
that there are other questions that I didn't
> >>>>>>>>>>>>>>>>> answer
yet, but let
> >>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>> answer
this first to limit the length of the text :)
> >>>>>>>>>>>>>>>> Sure, we
can split the discussion :)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ~ Johannes
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
Thanks,
> >>>>>>>>>>>>>>>>>
Juneyoung
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On
Thu, Jan 7, 2021 at 3:36 AM Johannes Doerfert <
> >>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On
1/5/21 8:00 PM, Juneyoung Lee wrote:
>
>>>>>>>>>>>>>>>>>>> Hi
Johannes,
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> I
read your proposal and thought about the model.
> >>>>>>>>>>>>>>>>>>
Cool, thanks!
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> As
you concerned in A3, certain programs may be
>
>>>>>>>>>>>>>>>>>>>
valid only when
> >>>>>>>>>>>>>> memory
>
>>>>>>>>>>>>>>>>>>>
blocks with overlapping lifetimes have disjoint
>
>>>>>>>>>>>>>>>>>>>
addresses.
>
>>>>>>>>>>>>>>>>>>>
Look at this example (I'm using malloc, but alloca
>
>>>>>>>>>>>>>>>>>>>
also works):
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> p1
= malloc(4)
>
>>>>>>>>>>>>>>>>>>> p2
= malloc(4) // for brevity, assume that there as
>
>>>>>>>>>>>>>>>>>>>
enough space & p1
> >>>>>>>>>>>>>>>> and
>
>>>>>>>>>>>>>>>>>>> p2
!= null
>
>>>>>>>>>>>>>>>>>>>
set<char*> s;
>
>>>>>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert
>
>>>>>>>>>>>>>>>>>>> did
nothing, it
> >>>>>>>>>>>>>>>> would
>
>>>>>>>>>>>>>>>>>>> be
surprise to programmers
>
>>>>>>>>>>>>>>>>>>>
work(s);
>
>>>>>>>>>>>>>>>>>>>
free(data1)
>
>>>>>>>>>>>>>>>>>>>
free(data2)
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
Clearly, IR semantics should guarantee that escaped
>
>>>>>>>>>>>>>>>>>>>
blocks are
> >>>>>>>>>>>>>>>> disjoint.
> >>>>>>>>>>>>>>>>>> It
>
>>>>>>>>>>>>>>>>>>>
would be great for verification tools on LLVM IR to
>
>>>>>>>>>>>>>>>>>>> be
able to answer
> >>>>>>>>>>>>>>>>>>
that
>
>>>>>>>>>>>>>>>>>>> the
second insert will succeed.
> >>>>>>>>>>>>>>>>>> I
agree, the second insert should succeed, assuming
> >>>>>>>>>>>>>>>>>>
`p1 && p2`.
> >>>>>>>>>>>>>>>>>> I
don't think my proposal would in any way impact the
> >>>>>>>>>>>>>>>>>>
program above,
> >>>>>>>>>>>>>>>>>> if
anything the A3 reasoning makes sure such a
> >>>>>>>>>>>>>>>>>>
program with allocas
> >>>>>>>>>>>>>>>>>> is
not miscompiled.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
I'm also not sure I understand what you try to argue
> >>>>>>>>>>>>>>>>>>
for. Maybe
> >>>>>>>>>>>>>>>>>>
elaborate a bit what it is you think is bad or needs
> >>>>>>>>>>>>>>>>>> to
be changed.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> The
problem is that definition of escapedness is not
>
>>>>>>>>>>>>>>>>>>>
clear at the
> >>>>>>>>>>>>>>>>>>
semantic
>
>>>>>>>>>>>>>>>>>>>
level. Describing the IR semantics w.r.t. LLVM's
>
>>>>>>>>>>>>>>>>>>>
escaped analysis
> >>>>>>>>>>>>>>>>>>
function
>
>>>>>>>>>>>>>>>>>>>
isn't something we want.
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
Semantic definition of escapedness of a pointer
>
>>>>>>>>>>>>>>>>>>>
seems hard, I mean
> >>>>>>>>>>>>>> in a
>
>>>>>>>>>>>>>>>>>>>
stepwise manner.
>
>>>>>>>>>>>>>>>>>>> It
isn't a single instruction such as 'escape i8*
>
>>>>>>>>>>>>>>>>>>>
ptr', and we need
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>
look
>
>>>>>>>>>>>>>>>>>>>
over all instructions in the function. For example,
>
>>>>>>>>>>>>>>>>>>>
'(int)(p+1) -
> >>>>>>>>>>>>>>>>
(int)p'
>
>>>>>>>>>>>>>>>>>>>
isn't semantically escaping the pointer p because
>
>>>>>>>>>>>>>>>>>>> the
result is 1
>
>>>>>>>>>>>>>>>>>>>
regardless of the value of p.
>
>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is
>
>>>>>>>>>>>>>>>>>>> a
desirable
> >>>>>>>>>>>>>>>>>>
direction
>
>>>>>>>>>>>>>>>>>>>
IMO.
> >>>>>>>>>>>>>>>>>>
I'm confused. What in the proposal would prevent us
> >>>>>>>>>>>>>>>>>>
from defining
> >>>>>>>>>>>>>>>>>>
the semantics of instructions, or force us to do it
> >>>>>>>>>>>>>>>>>> in
an "undesirable
> >>>>>>>>>>>>>>>>>>
way"?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> In
practice, syntactically checking escapedness +
>
>>>>>>>>>>>>>>>>>>>
nice engineering
> >>>>>>>>>>>>>>>> might
>
>>>>>>>>>>>>>>>>>>> not
break optimizations in most cases (as
>
>>>>>>>>>>>>>>>>>>>
undef/poison did); but it
> >>>>>>>>>>>>>>>> would
>
>>>>>>>>>>>>>>>>>>> be
great to move to another level, since LLVM IR is
>
>>>>>>>>>>>>>>>>>>>
used in so many
> >>>>>>>>>>>>>>>>>>
places
>
>>>>>>>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>
The property under which you can coalesce objects is
> >>>>>>>>>>>>>>>>>>
simple:
> >>>>>>>>>>>>>>>>>>   
It is not observable.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
Now, if the address of one of the two objects you
> >>>>>>>>>>>>>>>>>>
coalesce is not
> >>>>>>>>>>>>>>>>>>
observed, coalescing is not observable. That is a
> >>>>>>>>>>>>>>>>>>
sufficient condition
> >>>>>>>>>>>>>>>>>>
not a necessary one. Pointer "escaping" is one step
> >>>>>>>>>>>>>>>>>>
further. If the
> >>>>>>>>>>>>>>>>>>
address doesn't escape it is not observed. This does
> >>>>>>>>>>>>>>>>>>
not mean the
> >>>>>>>>>>>>>>>>>>
"semantic conditions for coalescing", e.g., for the
> >>>>>>>>>>>>>>>>>>
purpose of
> >>>>>>>>>>>>>>>>
translation
> >>>>>>>>>>>>>>>>>>
validation, is supposed to be build on top of our
> >>>>>>>>>>>>>>>>>>
"definition of
> >>>>>>>>>>>>>>>> escaping
> >>>>>>>>>>>>>>>>>>
pointers". That said, we use "does not escape" as a
> >>>>>>>>>>>>>>>>>>
precondition for
> >>>>>>>>>>>>>>>>>>
various transformation and I'm unsure what is any
> >>>>>>>>>>>>>>>>>>
different now. The
> >>>>>>>>>>>>>>>>>>
entire escape argument is only used in the validity
> >>>>>>>>>>>>>>>>>> of
the pointer
> >>>>>>>>>>>>>>>> folding.
> >>>>>>>>>>>>>>>>>>
Similarly, we can fold a comparison of a noalias
> >>>>>>>>>>>>>>>>>>
pointer with another
> >>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>> if
the former does not escape (and both are
> >>>>>>>>>>>>>>>>>>
dereferenced and one is
> >>>>>>>>>>>>>>>>>>
written for sure).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> The
pointer comparison is another beast. It indeed
>
>>>>>>>>>>>>>>>>>>> has
a few issues,
> >>>>>>>>>>>>>>>> and
>
>>>>>>>>>>>>>>>>>>>
solving it might require nontrivial solution.
> >>>>>>>>>>>>>>>>>> I
think the main problem of the inconsistencies and
> >>>>>>>>>>>>>>>>>>
such we've seen is
> >>>>>>>>>>>>>>>>>>
rooted in the erroneous folding of pointer
> >>>>>>>>>>>>>>>>>>
comparisons. Cleaning up
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>
lifetime marker semantics is actually unrelated and
> >>>>>>>>>>>>>>>>>>
simply not folding
> >>>>>>>>>>>>>>>>>> as
described in A3 should solve the issue that has
> >>>>>>>>>>>>>>>>>>
been reported.
> >>>>>>>>>>>>>>>>>>
Nevertheless,
> >>>>>>>>>>>>>>>>>> we
should take a crack at lifetime markers while we
> >>>>>>>>>>>>>>>>>>
are here.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ~
Johannes
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
Thanks,
>
>>>>>>>>>>>>>>>>>>>
Juneyoung
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>> On
Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <
> >>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>
>>>>>>>>>>>>>>>>>>>
wrote:
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Hi Juneyoung,
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Happy new year :)
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
After we had a lengthy discussion on phab last
>
>>>>>>>>>>>>>>>>>>>>
year, I've tried to
>
>>>>>>>>>>>>>>>>>>>>
summarize my thoughts,
>
>>>>>>>>>>>>>>>>>>>>
especially given that I had some time to think
>
>>>>>>>>>>>>>>>>>>>>
about things over the
> >>>>>>>>>>>>>>>>>>
break.
>
>>>>>>>>>>>>>>>>>>>>
Still, no promises on the quality ;)
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
I start with general questions I asked myself and
>
>>>>>>>>>>>>>>>>>>>>
then go on
> >>>>>>>>>>>>>> rambling
>
>>>>>>>>>>>>>>>>>>>>
about a potential design.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Q1: Is lifetime a given property or a derived one,
>
>>>>>>>>>>>>>>>>>>>>
thus is it fixed
> >>>>>>>>>>>>>> or
>
>>>>>>>>>>>>>>>>>>>>
modifiable?
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
This is a question I asked myself a lot recently. I
>
>>>>>>>>>>>>>>>>>>>>
think it is
> >>>>>>>>>>>>>>>> derived
>
>>>>>>>>>>>>>>>>>>>>
and modifiable,
>
>>>>>>>>>>>>>>>>>>>>
at least I hope it is. Only that would allow
>
>>>>>>>>>>>>>>>>>>>>
transformations I would
> >>>>>>>>>>>>>>>>>>
want
>
>>>>>>>>>>>>>>>>>>>>
us to do. Here are some examples:
>
>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>
>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Q2: Is a pointer comparison, or similar use,
>
>>>>>>>>>>>>>>>>>>>>
extending the lifetime?
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Asked differently, can we hoist a pointer
>
>>>>>>>>>>>>>>>>>>>>
comparison into a region
> >>>>>>>>>>>>>>>> where
>
>>>>>>>>>>>>>>>>>>>>
the pointer is dead?
>
>>>>>>>>>>>>>>>>>>>>
This is an important question which we haven't
>
>>>>>>>>>>>>>>>>>>>>
discussed much as we
>
>>>>>>>>>>>>>>>>>>>>
assumed LICM has to work.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
The current behavior is that non-dereferencing uses
>
>>>>>>>>>>>>>>>>>>>>
are not
> >>>>>>>>>>>>>> extending
>
>>>>>>>>>>>>>>>>>>>>
the lifetime and are
>
>>>>>>>>>>>>>>>>>>>>
allowed outside of "lifetime regions" (as indicated
>
>>>>>>>>>>>>>>>>>>>>
by markers).
> >>>>>>>>>>>>>> They
>
>>>>>>>>>>>>>>>>>>>>
will always produce valid
>
>>>>>>>>>>>>>>>>>>>>
results. Though, we might want to think about a
>
>>>>>>>>>>>>>>>>>>>>
lifetime marker that
>
>>>>>>>>>>>>>>>>>>>>
spits out a new pointer
>
>>>>>>>>>>>>>>>>>>>>
value instead of reusing the old one.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Q3: Can we use lifetime to argue about addresses?
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
The question here is, can we fold address
>
>>>>>>>>>>>>>>>>>>>>
comparisons based on
>
>>>>>>>>>>>>>>>>>>>>
lifetimes, or not.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
So far, we fold comparisons based on "address
>
>>>>>>>>>>>>>>>>>>>>
information". For
> >>>>>>>>>>>>>>>> example,
>
>>>>>>>>>>>>>>>>>>>>
we "know" globals,
>
>>>>>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one
>
>>>>>>>>>>>>>>>>>>>>
another. Also, two
> >>>>>>>>>>>>>>>> distinct
>
>>>>>>>>>>>>>>>>>>>>
allocations, for globals
>
>>>>>>>>>>>>>>>>>>>>
and allocas, are considered unequal. Now, the crux
>
>>>>>>>>>>>>>>>>>>>>
is that we have
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> be
>
>>>>>>>>>>>>>>>>>>>>
consistent if we do two
>
>>>>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number
>
>>>>>>>>>>>>>>>>>>>>
missing). Since
> >>>>>>>>>>>>>>>> the
>
>>>>>>>>>>>>>>>>>>>>
backend (or any other place
>
>>>>>>>>>>>>>>>>>>>>
for that matter) might coalesce allocas, their
>
>>>>>>>>>>>>>>>>>>>>
addresses might not
> >>>>>>>>>>>>>> be
>
>>>>>>>>>>>>>>>>>>>>
different after all. If we
>
>>>>>>>>>>>>>>>>>>>>
already folded a comparison to "unequal" we are
>
>>>>>>>>>>>>>>>>>>>>
doomed if we later
> >>>>>>>>>>>>>>>> have
>
>>>>>>>>>>>>>>>>>>>>
a comparison that results
>
>>>>>>>>>>>>>>>>>>>>
in "equal". (Note, this is different from aliasing
>
>>>>>>>>>>>>>>>>>>>>
rules as they can
> >>>>>>>>>>>>>>>> be
>
>>>>>>>>>>>>>>>>>>>>
stricter.)
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Design:
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
I would hope we can come up with a model that
>
>>>>>>>>>>>>>>>>>>>>
treats memory "the
> >>>>>>>>>>>>>>>>
same",
>
>>>>>>>>>>>>>>>>>>>>
regardless if it is global,
>
>>>>>>>>>>>>>>>>>>>>
stack, or heap. I want to avoid special casing one
>
>>>>>>>>>>>>>>>>>>>>
of them wrt.
> >>>>>>>>>>>>>>>> lifetime
>
>>>>>>>>>>>>>>>>>>>>
as I believe most optimizations
>
>>>>>>>>>>>>>>>>>>>>
would apply to any of them, potentially for
>
>>>>>>>>>>>>>>>>>>>>
different reasons and
> >>>>>>>>>>>>>> with
>
>>>>>>>>>>>>>>>>>>>>
different gains, but nevertheless.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
A1: Lifetime is a concept that talks about memory
>
>>>>>>>>>>>>>>>>>>>>
content *only*.
>
>>>>>>>>>>>>>>>>>>>>
Basically, the memory content is set to
>
>>>>>>>>>>>>>>>>>>>>
undefined by lifetime markers. It is
>
>>>>>>>>>>>>>>>>>>>>
derived/modifiable as
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> is
>
>>>>>>>>>>>>>>>>>>>>
just an "as-is" property of the memory
>
>>>>>>>>>>>>>>>>>>>>
content. The lifetimes of an object, as
>
>>>>>>>>>>>>>>>>>>>>
described by
> >>>>>>>>>>>>>> markers,
> >>>>>>>>>>>>>>>>>>
might
>
>>>>>>>>>>>>>>>>>>>>
change during the compilation. They
>
>>>>>>>>>>>>>>>>>>>>
might become smaller if we deduce the
>
>>>>>>>>>>>>>>>>>>>>
object is not accessed
> >>>>>>>>>>>>>>>> and
>
>>>>>>>>>>>>>>>>>>>>
the memory content is not used, they
>
>>>>>>>>>>>>>>>>>>>>
might become larger if objects with
>
>>>>>>>>>>>>>>>>>>>>
non-overlapping
> >>>>>>>>>>>>>> lifetimes
> >>>>>>>>>>>>>>>> are
>
>>>>>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
>
>>>>>>>>>>>>>>>>>>>>
introducing a new object though.)
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
A2: If we define lifetime as above, it has nothing
>
>>>>>>>>>>>>>>>>>>>>
to do with the
>
>>>>>>>>>>>>>>>>>>>>
address of an object. Consequently, pointer
>
>>>>>>>>>>>>>>>>>>>>
comparisons and similar operations are
>
>>>>>>>>>>>>>>>>>>>>
valid outside the
> >>>>>>>>>>>>>>>> lifetime.
>
>>>>>>>>>>>>>>>>>>>>
Loads and stores are as well, they can
>
>>>>>>>>>>>>>>>>>>>>
even not be removed "easily". A store
>
>>>>>>>>>>>>>>>>>>>>
followed by a lifetime
> >>>>>>>>>>>>>>>>>>
marker
>
>>>>>>>>>>>>>>>>>>>>
or a load following a lifetime marker
>
>>>>>>>>>>>>>>>>>>>>
is dead or results in undef respectively.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
A3: We could not use lifetime to argue about
>
>>>>>>>>>>>>>>>>>>>>
addresses. This means
> >>>>>>>>>>>>>> we
>
>>>>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>
>>>>>>>>>>>>>>>>>>>>
lifetimes result in different addresses.
>
>>>>>>>>>>>>>>>>>>>>
Thus, a comparison
> >>>>>>>>>>>>>>>>>>
between
>
>>>>>>>>>>>>>>>>>>>>
the address of two allocas could not
>
>>>>>>>>>>>>>>>>>>>>
immediately be folded. That said, there
>
>>>>>>>>>>>>>>>>>>>>
would be special
> >>>>>>>>>>>>>> cases
>
>>>>>>>>>>>>>>>>>>>>
though. Basically, if one of the allocas does
>
>>>>>>>>>>>>>>>>>>>>
not escape, other than the comparisons to
>
>>>>>>>>>>>>>>>>>>>>
be folded, we can
> >>>>>>>>>>>>>>>> fold
>
>>>>>>>>>>>>>>>>>>>>
them. Afterwards, coalescing or splitting
>
>>>>>>>>>>>>>>>>>>>>
would still be consistent because it is
>
>>>>>>>>>>>>>>>>>>>>
unobservable. The
> >>>>>>>>>>>>>>>> problem
>
>>>>>>>>>>>>>>>>>>>>
we have in-tree is that we fold even though
>
>>>>>>>>>>>>>>>>>>>>
the address is still observed (after the
>
>>>>>>>>>>>>>>>>>>>>
fold). It is still
> >>>>>>>>>>>>>>>>>>
unclear
>
>>>>>>>>>>>>>>>>>>>>
to me what the impact of this would be
>
>>>>>>>>>>>>>>>>>>>>
on real code. I suggested before that we
>
>>>>>>>>>>>>>>>>>>>>
run some
> >>>>>>>>>>>>>> experiments
> >>>>>>>>>>>>>>>>>>
first
>
>>>>>>>>>>>>>>>>>>>>
before we make any decision whatsoever.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
This is pretty much saying that lifetime markers are
> >>>>>>>>>>>>>> `memset(undef)`,
> >>>>>>>>>>>>>>>> as
>
>>>>>>>>>>>>>>>>>>>>
you suggested before (I think).
>
>>>>>>>>>>>>>>>>>>>>
There are some implementation level differences but
>
>>>>>>>>>>>>>>>>>>>>
at the end of
> >>>>>>>>>>>>>> the
>
>>>>>>>>>>>>>>>>>>>>
day they are basically the same.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Happy to hear some thoughts on this, especially if
>
>>>>>>>>>>>>>>>>>>>>
it fixes the
> >>>>>>>>>>>>>>>> problems
>
>>>>>>>>>>>>>>>>>>>>
that lead to D93376 in the first place.
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
~ Johannes
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>
>>>>>>>>>>>>>>>>>>>>>
Hello all,
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of
>
>>>>>>>>>>>>>>>>>>>>>
lifetime.start/end
> >>>>>>>>>>>>>> intrinsic
>
>>>>>>>>>>>>>>>>>>>>
here (
>
>>>>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
> >>>>>>>>>>>>>> (syntactically
> >>>>>>>>>>>>>>>> &
>
>>>>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
1. Is there a frontend or framework that
>
>>>>>>>>>>>>>>>>>>>>>
introduces lifetime call
> >>>>>>>>>>>>>>>> with
>
>>>>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>
>>>>>>>>>>>>>>>>>>>>>
If exists, which behavior do they expect from it?
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
2. Can a block's lifetime be started bytewise or
>
>>>>>>>>>>>>>>>>>>>>>
elementwise?
>
>>>>>>>>>>>>>>>>>>>>>
I imagine an optimization that allow using stack
>
>>>>>>>>>>>>>>>>>>>>>
very compactly,
> >>>>>>>>>>>>>> but
>
>>>>>>>>>>>>>>>>>>>>
wonder
>
>>>>>>>>>>>>>>>>>>>>>
there is a real-world use case.
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
Thanks,
>
>>>>>>>>>>>>>>>>>>>>>
Juneyoung
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>>
_______________________________________________
>
>>>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>
>>>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>
>>>>>>>>>>>>>>>>>>>>>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210209/533ff192/attachment-0001.html>

llvm dev - Feb 2021 - lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end