On 2/8/21 11:46 PM, Juneyoung Lee wrote:> Hi,
>
> About syntactic constraints - I think it can be helpful for transformations
> that need to be fully aware of all allocations' lifetimes.
> For example, AddressSanitizer is currently simply giving up instrumenting
> things if there is an unknown lifetime usage.
> If all lifetime intrinsics' pointers are guaranteed to be known
> expressions, this issue is resolved.
It is "resolved" by disallowing more complex use cases. To hyperbolize
an analogy: Constant propagation could avoid almost all of its
checks/analysis
if we restrict all expressions to be constant 0. I know this is not the same
thing but I hope you see where I'm getting at. The proper solution for ASAN
is to check if it can handle the lifetime markers and not utilize them
otherwise.
If they happen to fall in the proposed syntactic restrictions I assume ASAN
can already handle them fine anyway, right?
>
> if (HasUntracedLifetimeIntrinsic) {
> // If there are lifetime intrinsics which couldn't be traced back to an
> // alloca, we may not know exactly when a variable enters scope, and
> // therefore should "fail safe" by not poisoning them.
> StaticAllocaPoisonCallVec.clear();
> DynamicAllocaPoisonCallVec.clear();
> }
>
> Interestingly, AddressSanitizer is the one that raises an issue as well if
> the syntactic restriction is enforced.
> It replaces allocas with __asan_stack_malloc_N, but still leaves its
> lifetime uses, which breaks the syntactic restriction. A possible solution
> for this is to simply remove the lifetime calls.
I can't stress this enough, syntactic restrictions are rarely helpful.
Now we are talking about dropping information because of it. We seem
to have also fond a non-stack memory user, great.
~ Johannes
> Juneyoung
>
> On Mon, Feb 1, 2021 at 2:36 PM Johannes Doerfert <johannesdoerfert at
gmail.com>
> wrote:
>
>> On 1/30/21 7:40 AM, Ralf Jung wrote:
>>> Hi Johannes,
>>>
>>>>>> I was opposing the proposal because of the brittle
syntactic
>>>>>> restrictions. In my attempt to justify not having those
I also
>>>>>> tried to resolve the above inconsistency in a more
general manner
>>>>>> because I thought that was what it was all about. I
believe that we
>>>>>> either have to argue the address of a dead object is
>>>>>> unusable/undetermined, basically the direction of James
comment, or
>>>>>> we need to decouple lifetime markers from locations.
>>>>> Folding to false and coalescing are inconsistent with each
other,
>>>>> yes. It is just wrong to perform both of these on the above
program,
>>>>> I would say (if we interpret it with LLVM semantics, which
I assume
>>>>> has no "pointer zapping"). Is that up for debate?
>>>>>
>>>>> However, I do not see the immediate connection to lifetime
markers
>>>>> -- the same would be true for a program using malloc/free.
>>>>> Coalescing is permissible whenever lifetimes are disjoint.
Folding
>>>>> to false is only permissible if the pointers are
*definitely not
>>>>> equal*, e.g. if their lifetimes overlap.
>>>>>
>>>> I still believe folding can be done regardless of their
lifetime, as
>>>> long as it is done consistent. Fold all pointer observations
and the
>>>> actual
>>>> placement is not observable, we went down that road.
>>> Sure, but that's the kind of exception that always applies when
>>> discussing optimizations.
>>> But to make my statement more precise: you cannot fold comparison
of
>>> *escaped* pointers to false unless the pointers are *definitely not
>>> equal*, e.g. if their lifetimes overlap.
>>>
>> The interesting part is, lifetimes change. I continue to argue you
>> cannot fold if both pointers have escaped. If you do, and other things
>> happen that fiddle with the lifetime, you might accidentally coalesce
>> them (explicitly or implicitly).
>>
>>
>>>> I agree with the malloc/free comparison, I did argue that the
entire
>>>> time, or I tried.
>>>>
>>>> If there is no/little connection to lifetime markers, I assume
>>>> https://reviews.llvm.org/D93376 did not fix the inconsistency
then.
>>>> It was,
>>>> apparently, just an approach to clarify what a lifetime marker
is,
>>>> right? I might be more confused than before. I believe you
should equate
>>>> lifetime markers and memset(undef) and put the entire
discussion
>>>> about folding/coalescing onto a foundation that is unrelated to
lifetime
>>>> markers.
>>> My understanding is that lifetime markers are *explicitly intended
to
>>> support coalescing* by permitting later phases to put allocas with
>>> disjoint lifetimes into the same stack slot. D93376 intended to
keep
>>> it that way; fundamentally changing what lifetime markers are about
by
>>> making them only about the content and not the address of the
alloca
>>> seemed like a much more invasive change.
>>> Am I misunderstanding something?
>> I do think they were designed for that, yes. And I do agree my change
>> looks more invasive from that perspective.
>>
>> Now does D93376 help with the problems we have is my question. Or,
asked
>> differently, what is a way out of this which improves the big picture,
>> and what is practical. For the first, limiting it to allocas seems to
be
>> the wrong way. So far, the argument seems to be that "we only use
it for
>> them", but I'm not sure why a use for other pointers would be
somehow
>> bad (https://godbolt.org/z/G8obj3). Second, limiting it to syntactic
>> constructs is the opposite of practical. It has been said before,
>> syntactic constraints are brittle. Beyond that, the syntactic def-use
>> chain requirement has not even come up in any of this mails, at least
as
>> far as I understand it. It still is unclear to me how that would help.
>>
>> That all said, let's see what we need:
>> 1) A way to ensure we don't create inconsistent situations wrt.
>> allocation addresses where there were none.
>> 2) A way to ensure we can reasonably coalesce allocas to minimize stack
>> usage.
>>
>> For one, I feel we converged pretty much on the fact that pointer
>> comparison folding is a problem. In the example below we have lifetime
>> markers and we use them to argue about lifetimes of the allocations,
but
>> we still agree the folding should not happen. Once the folding is
>> restricted to pointer pairs for which at least one doesn't have
it's
>> address observed, I would like to revisit 1). So far, I believe it
would
>> be resolved. For 2), we could stick with what we have now, or adapt
>> something new afterwards. I guess the existing wording needs to be
>> cleared up either way.
>>
>> Long story short, I don't see how D93376 addresses what seems to be
the
>> problem.
>>
>>
>>
>>>>> The syntactic condition also provides a clear path to
evaluate
>>>>> whether any given folding or coalescing is correct.
>>>> I disagree. The syntactic conditions, at least as I read them,
allow
>>>> the above program with lifetime markers (as we have it now) but
did not
>>>> suggest it would be wrong to fold and later coalesce. Or maybe
I
>>>> missed why the syntactic restrictions would prevent that,
please
>>>> explain then.
>>> Where in that program would you place the lifetime markers? I
assume
>>> at the beginning and end of the local variable's scope?
>>> So that would become something like (also reflecting that allocas
are
>>> all at the top):
>>>
>>> void foo() {
>>> intptr_t xp, yp;
>>> int x, y;
>>> { lifetime.start(&x); xp = &x; bar(&x);
lifetime.end(&x); }
>>> { lifetime.start(&y); yp = &y; bar(&y);
lifetime.end(&y); }
>>> print(xp == yp); // can we fold this comparison
to
>>> false?
>>> }
>>>
>> Yes, this is how clang would generate the code right now.
>>
>>
>>> I think evaluating correctness of folding to false here is quite
clear
>>> (I assume this is just using C syntax to represent LLVM IR, so we
>>> applyLLVM IR semantics):
>> Agreed.
>>
>>
>>> - The program as-written is non-deterministic; x and y have
disjoint
>>> lifetime so they could have the same or a different address.
>> Agreed (for the C version for which the above is the LLVM-IR
>> implementation).
>>
>>
>>> - When the compiler decides to fold the comparison to false, it
means
>>> it has to commit to one of the possible non-det choices, and needs
to
>>> stick to that choice.
>> Agreed.
>>
>>
>>> So *just* replacing "xp == yp" by "false" is
incorrect in the sense
>>> that the new program has a possible behavior that did not occur
with
>>> the old program: "true false" can now be printed (*if*
the allocator
>>> chooses to put x and y into the same location), whereas the
original
>>> program would never print "true false".
>> Agreed.
>>
>>
>>> Note that we do not need to talk about coalescing here, we are just
>>> exploiting the fact that allocation is non-deterministic and that
>>> allocations with non-overlapping lifetime can be in the same spot
or
>>> in different spots.
>> Sure. When I refer to coalescing I was also including "implicit
>> coalescing", e.g., as happened in my outlining example.
>>
>>
>>> IOW, coalescing "x" and "y" in this program is
correct, but folding
>>> the comparison to false is incorrect.
>> Agreed, assuming we don't do anything to bar.
>>
>>
>>> So when you disagree with my statement that we have "a clear
path to
>>> evaluate whether any given folding or coalescing is correct",
then
>>> which of the above to you disagree with?
>> None. I disagreed with the part "The syntactic condition also
provides a
>> clear path ...", which implies this path was dependent on the
syntactic
>> condition. I don't see you making an argument about syntactic
def-use
>> chains above.
>>
>> I am basically struggling to understand why we need to introduce
>> syntactic restrictions on lifetime markers and why they need to be
>> limited to allocas. Those are two concepts included in D93376, right?
>>
>>
>>> Folding to false would be correct if it would be accompanied by
also
>>> embedding into the program some hint which tells the allocator that
>>> the two allocations must not be at the same spot in memory. This
would
>>> be a way to ensure that LLVM "sticks to its choice". But
that issue is
>>> entirely orthogonal to the semantics of lifetime.start /
lifetime.end;
>>> it arises with allocation lifetimes of any kind, including
malloc/free.
>>>
>>> I think you are proposing to make them only about the contents of
the
>>> alloca, so in the above "foo", now the lifetime of (the
allocations
>>> backing) "x" and "y" *would* overlap, so
folding to false (of escaped
>>> pointers) is... still not allowed, but now for less subtle reasons.
>>> But this doesn't *solve* the "fold to false" problem,
it just means
>>> that the problem no longer applies to this particular program.
>> My original email in this thread addressed the folding problem but I
>> clarified since: Don't fold if both addresses are observed,
regardless
>> of any lifetime argument. You seem to mix the comment I made on phab
>> last year with the email discussion. The phab conversation is outdated
>> and replaced by this email thread.
>>
>>
>>> Or is your argument that, since "fold to false" is wrong
in both
>>> cases, it doesn't really matter which choice we make? However,
in
>>> your proposal, the option of doing the fold while also "adding
hints
>>> for the allocator" no longer exists.
>> My proposal comes in two parts: Fix the folding issue, then revisit
>> markers. I'm not convinced that the folding issue can be fixed by
"hints
>> for the allocator" but I believe the escape argument holds.
>>
>>
>>
>>>> I don't think anyone wanted that. TBH, I think my proposal
is by far
>>>> the most compatible one wrt. the "classic" model w/o
lifetime
>>>> markers, given
>>>> that I proposed not to entangle them in object allocation.
Everything
>>>> else I proposed are "as-if" transformations. Do you
disagree?
>>> I agree that making the markers just erase the contents of the
alloca
>>> is the simplest proposal on the table. *If* we truly can make LLVM
>>> follow that model, that would be very nice.
>>> I would just be very surprised if it was possible to get this
proposal
>>> past the people that will carefully watch over the existing
>>> optimization potential LLVM has, and make sure it does not become
>>> smaller. ;) Maybe I am too pessimistic, but I would have never
dared
>>> even making such a proposal as I would expect it to be shot down.
I'll
>>> be happy to be proven wrong about this. :)
>> I've made a few actually, `mustprogress` was the latest and you
were
>> involved ;). It does prevent certain optimizations we did before
>> unconditionally but it also fixes a conceptual problem. At least the
>> pointer inconsistency is a conceptual problem that needs fixing. What
we
>> do with lifetime markers is probably orthogonal.
>>
>> ~ Johannes
>>
>>
>>> Kind regards,
>>> Ralf
>>>
>>>> ~ Johannes
>>>>
>>>>
>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ralf
>>>>>>>
>>>>>>>> A lot of my arguments around alloca/malloc/...
is that we can
>>>>>>>> augment the information from the frontend by
determining if an
>>>>>>>> address is observed or not. If it is not, we
can limit its
>>>>>>>> allocation, coalesce it, etc. Lifetime markers
can help us do
>>>>>>>> this but they are not the only way.
>>>>>>>> So, basically, we do not need lifetime markers
to coalesce
>>>>>>>> alloca/malloc/... but that they can help.
>>>>>>>
>>>>>>>
>>>>>>>> Let's try to get out of this rabbit hole.
If you have a problem
>>>>>>>> with the proposal as outlined in
>>>>>>>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
>>>>>>>> , please let me know. I assume dealing with
observed addresses
>>>>>>>> different than with non-observede ones (implied
by noescape) is
>>>>>>>> the way forward now.
>>>>>>>>
>>>>>>>> ~ Johannes
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Ralf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Kind regards,
>>>>>>>>>>> Ralf
>>>>>>>>>>>
>>>>>>>>>>>> Does that make some sense?
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>> Ralf
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/7/21 10:12 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>> On Fri, Jan 8, 2021
at 8:54 AM Johannes Doerfert
>>>>>>>>>>>>>>>
<johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 1/7/21 4:55
PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>> It is,
alloca must have its integral address decided at
>>>>>>>>>>>>>>>>> its
*allocation*
>>>>>>>>>>>>>>>>> time as
well, like 0xFF00... .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For
example, if 0xFFFF0000 ~ 0xFF000000 is already
>>>>>>>>>>>>>>>>> allocated,
the new
>>>>>>>>>>>>>>>>> alloca
cannot be placed there unless it is not observed
>>>>>>>>>>>>>>>>> in the
future,
>>>>>>>>>>>>>>>>> according
to your proposal.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But how do
we know from the current state whether the
>>>>>>>>>>>>>>>>> stack
variable is
>>>>>>>>>>>>>>>>> going to be
observed or not?
>>>>>>>>>>>>>>>> With that
argument you could allocate all but 4 bytes, do
>>>>>>>>>>>>>>>> a malloc(4)
>>>>>>>>>>>>>>>> and know the
address of the returned pointer (assuming it
>>>>>>>>>>>>>>>> is not null).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What I try to
say is, either your scenario is part of the
>>>>>>>>>>>>>>>> model and
>>>>>>>>>>>>>>>> everything we
do is broken as you could "observe"
>>>>>>>>>>>>>>>> addresses
passively,
>>>>>>>>>>>>>>>> *or*, the
abstract machine we use for semantic reasoning
>>>>>>>>>>>>>>>> doesn't
permit
>>>>>>>>>>>>>>>> the above
reasoning. I really hope for the latter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's a very
valid point..!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Actually, I have a
memory model that addresses this problem.
>>>>>>>>>>>>>>> The gist is that we
can interpret each allocation
>>>>>>>>>>>>>>> instruction as
*creating
>>>>>>>>>>>>>>> 2 blocks* and
nondeterministically returning one of them.
>>>>>>>>>>>>>>> The other one is
marked as invalid but not freed
>>>>>>>>>>>>>>> immediately.
Deallocation
>>>>>>>>>>>>>>> frees both blocks.
>>>>>>>>>>>>>>> If there is not
enough space for having two blocks, it is
>>>>>>>>>>>>>>> treated as
>>>>>>>>>>>>>>> out-of-memory.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It makes guessing
the address of an allocation according
>>>>>>>>>>>>>>> to memory layout
>>>>>>>>>>>>>>> invalid UB.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> p = malloc(4)
>>>>>>>>>>>>>>> // If p != null, it
is guaranteed there was at least two 4
>>>>>>>>>>>>>>> byte slots
>>>>>>>>>>>>>>> available, say
0x100~0x104 and 0x110~0x114.
>>>>>>>>>>>>>>> // Two blocks are
allocated at 0x110 and 0x114, and one of
>>>>>>>>>>>>>>> the addresses is
>>>>>>>>>>>>>>>
nondeterministically returned.
>>>>>>>>>>>>>>> // By the nature of
nondeterminism, all executions below
>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>> well-defined
regardless of the address of p.
>>>>>>>>>>>>>>> *(int*)0x100 = 10;
// In an execution where p is 0x110,
>>>>>>>>>>>>>>> this raises UB.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The paper link is
here
>>>>>>>>>>>>>>>
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>>>>>>> Nice! Thanks for the
link :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To argue
differently: Who is to say there is a stack, or
>>>>>>>>>>>>>>> only one,
>>>>>>>>>>>>>>>> or that alloca
allocates memory "on the one stack"? That
>>>>>>>>>>>>>>>> is not part of
>>>>>>>>>>>>>>>> the IR, IMHO. I
can write a machine on which alloca
>>>>>>>>>>>>>>>> lowers to
malloc,
>>>>>>>>>>>>>>>> I don't
even need the free during stack unwind but that I
>>>>>>>>>>>>>>>> could do as
>>>>>>>>>>>>>>>> well if I
wanted to.
>>>>>>>>>>>>>>> This is right, the
example was for illustrative purposes.
>>>>>>>>>>>>>>> IR does not
>>>>>>>>>>>>>>> enforce alloca to
be placed at 'stack'.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Making
icmp/ptrtoint yield poison will still make loop
>>>>>>>>>>>>>>>>> versioning
or
>>>>>>>>>>>>>>>>> pointer
rewriting transformations unsound because these
>>>>>>>>>>>>>>>>> operations
now
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> create
poison (even if pointers are noundef).
>>>>>>>>>>>>>>>> I did not say
they yield poison, at least I did not try
>>>>>>>>>>>>>>>> to say that.
>>>>>>>>>>>>>>>> What are you
referring to exactly?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was referring to
this:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But this
makes ptrtoint/icmp make UB-raising
>>>>>>>>>>>>>>>>>
instructions, which
>>>>>>>>>>>>>>>>> contradicts
with what LLVM does.
>>>>>>>>>>>>>>>> As with other
violation of attributes I would, on first
>>>>>>>>>>>>>>>> though, suggest
>>>>>>>>>>>>>>>> to produce
poison, not UB.
>>>>>>>>>>>>>>> But it is more
about the (imaginary) attribute, so maybe I
>>>>>>>>>>>>>>> was slightly out
>>>>>>>>>>>>>>> of topic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) is fine, I
think the suggestion semantically makes
>>>>>>>>>>>>>>>> sense
perfectly. 1)
>>>>>>>>>>>>>>>>> is
something I'm concerned about now.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There are
more than pointer foldings, such as rewriting
>>>>>>>>>>>>>>>>> such
expression,
>>>>>>>>>>>>>>>>> code motion
ptr cmp, introduce ptr cmp, etc. There is
>>>>>>>>>>>>>>>>> also
analysis
>>>>>>>>>>>>>>>> relying
>>>>>>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>>>>>>> Defining
the correctness of each of them is something we
>>>>>>>>>>>>>>>>> want to
avoid,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> maybe
that's why we want to define precise semantics for
>>>>>>>>>>>>>>>>> things.
>>>>>>>>>>>>>>>> I don't get
the point. My proposal does not change the
>>>>>>>>>>>>>>>> semantics of
>>>>>>>>>>>>>>>> pointer
comparisons, at all. I explicitly mentioned that
>>>>>>>>>>>>>>>> in the last
>>>>>>>>>>>>>>>> email.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Oh okay, I thought
it was a part of the lifetime proposal,
>>>>>>>>>>>>>>> but it seems
>>>>>>>>>>>>>>> more like a
separate thing.
>>>>>>>>>>>>>>> I agree that this
requires performance impact.
>>>>>>>>>>>>>>> Also investigation
of existing transformations would be
>>>>>>>>>>>>>>> needed;
Alive2's
>>>>>>>>>>>>>>> pointer comparison
is doing approximation yet, but if it
>>>>>>>>>>>>>>> becomes fully
>>>>>>>>>>>>>>> precise, it will
show something from running LLVM unit
>>>>>>>>>>>>>>> tests I believe..!
:)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this
will be less aggressive and may give nice
>>>>>>>>>>>>>>>> feedback to
>>>>>>>>>>>>>>>>> potential
projects that are using lifetime with non-alloca.
>>>>>>>>>>>>>>>> The lifetime
marker debate, basically 2) above, is
>>>>>>>>>>>>>>>> orthogonal to
the
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> you try to
solve. It got mixed in as lifetime markers
>>>>>>>>>>>>>>>> were used by
>>>>>>>>>>>>>>>> StackColoring
>>>>>>>>>>>>>>>> to perform
coalescing but that is coincidental. You can
>>>>>>>>>>>>>>>> (arguably)
>>>>>>>>>>>>>>>> coalesce stack
>>>>>>>>>>>>>>>> allocations
regardless of lifetime markers and with 1)
>>>>>>>>>>>>>>>> such a
>>>>>>>>>>>>>>>> transformation
>>>>>>>>>>>>>>>> (w/ and w/o
lifetime markers) would actually be sound.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> At the end,
we can implement IR writer that lowers
>>>>>>>>>>>>>>>>> lifetime
with
>>>>>>>>>>>>>>>> non-alloca
>>>>>>>>>>>>>>>>> into
memset(undef). WDYT?
>>>>>>>>>>>>>>>> Yeah, 2) is
orthogonal and we can lower it that way.
>>>>>>>>>>>>>>>> Unsure if it is
>>>>>>>>>>>>>>>> helpful
>>>>>>>>>>>>>>>> but we can
certainly define it that way in the LangRef.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Okay, thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> p.s. The
reply was late, sorry. I think I can spend more
>>>>>>>>>>>>>>>>> time on
this
>>>>>>>>>>>>>>>> today.
>>>>>>>>>>>>>>>>> On Thu, Jan
7, 2021 at 9:02 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On
1/6/21 4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is
>>>>>>>>>>>>>>>>>>>>>
a desirable
>>>>>>>>>>>>>>>>>>>
direction
>>>>>>>>>>>>>>>>>>>>>
IMO.
>>>>>>>>>>>>>>>>>>>>
I'm confused. What in the proposal would prevent us
>>>>>>>>>>>>>>>>>>>>
from defining
>>>>>>>>>>>>>>>>>>>>
the semantics of instructions, or force us to do it
>>>>>>>>>>>>>>>>>>>>
in an "undesirable
>>>>>>>>>>>>>>>>>>>
way"?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
meant it would be great if the output state after
>>>>>>>>>>>>>>>>>>>
executing an
>>>>>>>>>>>>>>>>>>>
instruction can be described using its input state.
>>>>>>>>>>>>>>>>>>>
(that was the meaning of 'stepwise semantics', I
>>>>>>>>>>>>>>>>>>>
should have been more
>>>>>>>>>>>>>>>>>>>
clear about this)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For
example, the semantics of 'z = add x y' can be
>>>>>>>>>>>>>>>>>>>
defined as follows:
>>>>>>>>>>>>>>>>>>>
Given an input state s, next state s' = s[z -> s(x) +
>>>>>>>>>>>>>>>>>>>
s(y)]
>>>>>>>>>>>>>>>>>>>
where s(x) is the value of x in the previous state,
>>>>>>>>>>>>>>>>>>> and
s[z -> v] is a
>>>>>>>>>>>>>>>>>>>
state with z updated to v.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Another example that involves memory: the semantics of
>>>>>>>>>>>>>>>>>>>
'i = ptrtoint p'
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> be
defined as follows:
>>>>>>>>>>>>>>>>>>>
Given an input state s, next state s' = s[i ->
>>>>>>>>>>>>>>>>>>>
s(p).obj.address +
>>>>>>>>>>>>>>>>>>>
s(p).offset]
>>>>>>>>>>>>>>>>>>>
where obj.address is the begin address of a memory
>>>>>>>>>>>>>>>>>>>
object obj pointed
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>>>>>>
& offset is p's byte offset. (Imagine a pointer to the
>>>>>>>>>>>>>>>>>>>
offset of some
>>>>>>>>>>>>>>>>>> char
>>>>>>>>>>>>>>>>>>>
array).
>>>>>>>>>>>>>>>>>>>
Note that ptrtoint & add can be nicely defined w.r.t
>>>>>>>>>>>>>>>>>>> the
input state.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Now, the instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>>>>>>>> To
describe the output state after executing 'p
>>>>>>>>>>>>>>>>>>>
alloca', the address
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> new
alloca should be determined.
>>>>>>>>>>>>>>>>>>> If
observedness is involved, we need to know the
>>>>>>>>>>>>>>>>>>>
future state again. :/
>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>
don't know whether the alloca is going to be observed
>>>>>>>>>>>>>>>>>>> or
not without
>>>>>>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>>>>>>> the
future.
>>>>>>>>>>>>>>>>>>>
This is the problem of the current lifetime intrinsics
>>>>>>>>>>>>>>>>>>> as
well.
>>>>>>>>>>>>>>>>>> No, you
mix things up here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Nobody
proposed to modify the semantics of `alloca`.
>>>>>>>>>>>>>>>>>>
`alloca` provides you with a fresh, unobserved block of
>>>>>>>>>>>>>>>>>>
dereferenceable memory that is implicitly freed as the
>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>
unwinds. That is it. No context necessary.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you
want to modify the IR, you need to argue the
>>>>>>>>>>>>>>>>>>
observable
>>>>>>>>>>>>>>>>>>
semantics which is nothing new. That this might require
>>>>>>>>>>>>>>>>>> more
than
>>>>>>>>>>>>>>>>>> a
peephole view of the program is also not new.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> One
possible approach to resolve this is adding an
>>>>>>>>>>>>>>>>>>>
'unobserved' flag to
>>>>>>>>>>>>>>>>>>>
alloca instruction (similar to what was suggested by
>>>>>>>>>>>>>>>>>>>
Nicolai).
>>>>>>>>>>>>>>>>>>>
And, we can say that if alloca with 'unobserved' is
>>>>>>>>>>>>>>>>>>>
used by
>>>>>>>>>>>>>>>>>>
ptrtoint/icmp,
>>>>>>>>>>>>>>>>>>> it
is UB.
>>>>>>>>>>>>>>>>>> The
flag can be added, like we add other attributes. It
>>>>>>>>>>>>>>>>>> should
not
>>>>>>>>>>>>>>>>>> be
required for any optimization we talked about
>>>>>>>>>>>>>>>>>> though.
It basically
>>>>>>>>>>>>>>>>>> is a
way to manifest derived or given information into
>>>>>>>>>>>>>>>>>> the IR.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
Attribute deduction, as well as frontends with domain
>>>>>>>>>>>>>>>>>>
knowledge,
>>>>>>>>>>>>>>>>>> can add
such information. The flag we discussed in phab
>>>>>>>>>>>>>>>>>> was not
even
>>>>>>>>>>>>>>>>>>
sufficient for all the transformation examples I
>>>>>>>>>>>>>>>>>>
presented in my mail,
>>>>>>>>>>>>>>>>>> that is
why I extended my argument. We could still
>>>>>>>>>>>>>>>>>> have a
"noescape"
>>>>>>>>>>>>>>>>>> flag
for allocas, but I'm not sure how useful that
>>>>>>>>>>>>>>>>>> really
is. We can
>>>>>>>>>>>>>>>>>>
certainly deduce it and manifest it, unsure if we have
>>>>>>>>>>>>>>>>>> domain
knowledge
>>>>>>>>>>>>>>>>>> we can
use for non-trivial cases though.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> But
this makes ptrtoint/icmp make UB-raising
>>>>>>>>>>>>>>>>>>>
instructions, which
>>>>>>>>>>>>>>>>>>>
contradicts with what LLVM does.
>>>>>>>>>>>>>>>>>> As with
other violation of attributes I would, on first
>>>>>>>>>>>>>>>>>> though,
suggest
>>>>>>>>>>>>>>>>>> to
produce poison, not UB.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Also, existing optimizations like loop versioning can
>>>>>>>>>>>>>>>>>>>
introduce
>>>>>>>>>>>>>>>>>>>
ptrtoint/pointer comparisons too.
>>>>>>>>>>>>>>>>>> Sure. I
am not certain why that is a problem. I get the
>>>>>>>>>>>>>>>>>> feeling
things
>>>>>>>>>>>>>>>>>> are
still mixed up here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What I
proposed is twofold:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) We
stop folding comparisons between different
>>>>>>>>>>>>>>>>>> allocas
if changing the
>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>>
of both might be observable. Thus, if both might
>>>>>>>>>>>>>>>>>> have
their address
>>>>>>>>>>>>>>>>>>
"taken"/escaped,
>>>>>>>>>>>>>>>>>>
other than the comparisons we want to fold, we
>>>>>>>>>>>>>>>>>> cannot
proceed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) We
define lifetime markers to mean `memset(undef)`.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
first should be sufficient for the problem you were
>>>>>>>>>>>>>>>>>> trying
to solve
>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>> first
place. The second makes lifetime markers less
>>>>>>>>>>>>>>>>>> weird.
Note that 1)
>>>>>>>>>>>>>>>>>> is not
changing
>>>>>>>>>>>>>>>>>> the
semantics of the IR. We basically just argue there
>>>>>>>>>>>>>>>>>> is a
bug in our
>>>>>>>>>>>>>>>>>>
instcombine right
>>>>>>>>>>>>>>>>>> now as
we do not check all necessary preconditions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
see that there are other questions that I didn't
>>>>>>>>>>>>>>>>>>>
answer yet, but let
>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>
answer this first to limit the length of the text :)
>>>>>>>>>>>>>>>>>> Sure,
we can split the discussion :)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~
Johannes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Thu, Jan 7, 2021 at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>>>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On 1/5/21 8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>>>>>>
Hi Johannes,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I read your proposal and thought about the model.
>>>>>>>>>>>>>>>>>>>>
Cool, thanks!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
As you concerned in A3, certain programs may be
>>>>>>>>>>>>>>>>>>>>>
valid only when
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>>>>>
blocks with overlapping lifetimes have disjoint
>>>>>>>>>>>>>>>>>>>>>
addresses.
>>>>>>>>>>>>>>>>>>>>>
Look at this example (I'm using malloc, but alloca
>>>>>>>>>>>>>>>>>>>>>
also works):
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
p1 = malloc(4)
>>>>>>>>>>>>>>>>>>>>>
p2 = malloc(4) // for brevity, assume that there as
>>>>>>>>>>>>>>>>>>>>>
enough space & p1
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>
p2 != null
>>>>>>>>>>>>>>>>>>>>>
set<char*> s;
>>>>>>>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert
>>>>>>>>>>>>>>>>>>>>>
did nothing, it
>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>
be surprise to programmers
>>>>>>>>>>>>>>>>>>>>>
work(s);
>>>>>>>>>>>>>>>>>>>>>
free(data1)
>>>>>>>>>>>>>>>>>>>>>
free(data2)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Clearly, IR semantics should guarantee that escaped
>>>>>>>>>>>>>>>>>>>>>
blocks are
>>>>>>>>>>>>>>>>>>
disjoint.
>>>>>>>>>>>>>>>>>>>>
It
>>>>>>>>>>>>>>>>>>>>>
would be great for verification tools on LLVM IR to
>>>>>>>>>>>>>>>>>>>>>
be able to answer
>>>>>>>>>>>>>>>>>>>>
that
>>>>>>>>>>>>>>>>>>>>>
the second insert will succeed.
>>>>>>>>>>>>>>>>>>>>
I agree, the second insert should succeed, assuming
>>>>>>>>>>>>>>>>>>>>
`p1 && p2`.
>>>>>>>>>>>>>>>>>>>>
I don't think my proposal would in any way impact the
>>>>>>>>>>>>>>>>>>>>
program above,
>>>>>>>>>>>>>>>>>>>>
if anything the A3 reasoning makes sure such a
>>>>>>>>>>>>>>>>>>>>
program with allocas
>>>>>>>>>>>>>>>>>>>>
is not miscompiled.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I'm also not sure I understand what you try to argue
>>>>>>>>>>>>>>>>>>>>
for. Maybe
>>>>>>>>>>>>>>>>>>>>
elaborate a bit what it is you think is bad or needs
>>>>>>>>>>>>>>>>>>>>
to be changed.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
The problem is that definition of escapedness is not
>>>>>>>>>>>>>>>>>>>>>
clear at the
>>>>>>>>>>>>>>>>>>>>
semantic
>>>>>>>>>>>>>>>>>>>>>
level. Describing the IR semantics w.r.t. LLVM's
>>>>>>>>>>>>>>>>>>>>>
escaped analysis
>>>>>>>>>>>>>>>>>>>>
function
>>>>>>>>>>>>>>>>>>>>>
isn't something we want.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Semantic definition of escapedness of a pointer
>>>>>>>>>>>>>>>>>>>>>
seems hard, I mean
>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>
stepwise manner.
>>>>>>>>>>>>>>>>>>>>>
It isn't a single instruction such as 'escape i8*
>>>>>>>>>>>>>>>>>>>>>
ptr', and we need
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>
look
>>>>>>>>>>>>>>>>>>>>>
over all instructions in the function. For example,
>>>>>>>>>>>>>>>>>>>>>
'(int)(p+1) -
>>>>>>>>>>>>>>>>>>
(int)p'
>>>>>>>>>>>>>>>>>>>>>
isn't semantically escaping the pointer p because
>>>>>>>>>>>>>>>>>>>>>
the result is 1
>>>>>>>>>>>>>>>>>>>>>
regardless of the value of p.
>>>>>>>>>>>>>>>>>>>>>
Stepwisely defining the semantics of instructions is
>>>>>>>>>>>>>>>>>>>>>
a desirable
>>>>>>>>>>>>>>>>>>>>
direction
>>>>>>>>>>>>>>>>>>>>>
IMO.
>>>>>>>>>>>>>>>>>>>>
I'm confused. What in the proposal would prevent us
>>>>>>>>>>>>>>>>>>>>
from defining
>>>>>>>>>>>>>>>>>>>>
the semantics of instructions, or force us to do it
>>>>>>>>>>>>>>>>>>>>
in an "undesirable
>>>>>>>>>>>>>>>>>>>>
way"?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
In practice, syntactically checking escapedness +
>>>>>>>>>>>>>>>>>>>>>
nice engineering
>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>>
not break optimizations in most cases (as
>>>>>>>>>>>>>>>>>>>>>
undef/poison did); but it
>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>
be great to move to another level, since LLVM IR is
>>>>>>>>>>>>>>>>>>>>>
used in so many
>>>>>>>>>>>>>>>>>>>>
places
>>>>>>>>>>>>>>>>>>>>>
:)
>>>>>>>>>>>>>>>>>>>>
The property under which you can coalesce objects is
>>>>>>>>>>>>>>>>>>>>
simple:
>>>>>>>>>>>>>>>>>>>>
It is not observable.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Now, if the address of one of the two objects you
>>>>>>>>>>>>>>>>>>>>
coalesce is not
>>>>>>>>>>>>>>>>>>>>
observed, coalescing is not observable. That is a
>>>>>>>>>>>>>>>>>>>>
sufficient condition
>>>>>>>>>>>>>>>>>>>>
not a necessary one. Pointer "escaping" is one step
>>>>>>>>>>>>>>>>>>>>
further. If the
>>>>>>>>>>>>>>>>>>>>
address doesn't escape it is not observed. This does
>>>>>>>>>>>>>>>>>>>>
not mean the
>>>>>>>>>>>>>>>>>>>>
"semantic conditions for coalescing", e.g., for the
>>>>>>>>>>>>>>>>>>>>
purpose of
>>>>>>>>>>>>>>>>>>
translation
>>>>>>>>>>>>>>>>>>>>
validation, is supposed to be build on top of our
>>>>>>>>>>>>>>>>>>>>
"definition of
>>>>>>>>>>>>>>>>>>
escaping
>>>>>>>>>>>>>>>>>>>>
pointers". That said, we use "does not escape" as a
>>>>>>>>>>>>>>>>>>>>
precondition for
>>>>>>>>>>>>>>>>>>>>
various transformation and I'm unsure what is any
>>>>>>>>>>>>>>>>>>>>
different now. The
>>>>>>>>>>>>>>>>>>>>
entire escape argument is only used in the validity
>>>>>>>>>>>>>>>>>>>>
of the pointer
>>>>>>>>>>>>>>>>>>
folding.
>>>>>>>>>>>>>>>>>>>>
Similarly, we can fold a comparison of a noalias
>>>>>>>>>>>>>>>>>>>>
pointer with another
>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>
if the former does not escape (and both are
>>>>>>>>>>>>>>>>>>>>
dereferenced and one is
>>>>>>>>>>>>>>>>>>>>
written for sure).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
The pointer comparison is another beast. It indeed
>>>>>>>>>>>>>>>>>>>>>
has a few issues,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>
solving it might require nontrivial solution.
>>>>>>>>>>>>>>>>>>>>
I think the main problem of the inconsistencies and
>>>>>>>>>>>>>>>>>>>>
such we've seen is
>>>>>>>>>>>>>>>>>>>>
rooted in the erroneous folding of pointer
>>>>>>>>>>>>>>>>>>>>
comparisons. Cleaning up
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>
lifetime marker semantics is actually unrelated and
>>>>>>>>>>>>>>>>>>>>
simply not folding
>>>>>>>>>>>>>>>>>>>>
as described in A3 should solve the issue that has
>>>>>>>>>>>>>>>>>>>>
been reported.
>>>>>>>>>>>>>>>>>>>>
Nevertheless,
>>>>>>>>>>>>>>>>>>>>
we should take a crack at lifetime markers while we
>>>>>>>>>>>>>>>>>>>>
are here.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
~ Johannes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Tue, Jan 5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>>>>>>
wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Hi Juneyoung,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Happy new year :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
After we had a lengthy discussion on phab last
>>>>>>>>>>>>>>>>>>>>>>
year, I've tried to
>>>>>>>>>>>>>>>>>>>>>>
summarize my thoughts,
>>>>>>>>>>>>>>>>>>>>>>
especially given that I had some time to think
>>>>>>>>>>>>>>>>>>>>>>
about things over the
>>>>>>>>>>>>>>>>>>>>
break.
>>>>>>>>>>>>>>>>>>>>>>
Still, no promises on the quality ;)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I start with general questions I asked myself and
>>>>>>>>>>>>>>>>>>>>>>
then go on
>>>>>>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>>>>>>>
about a potential design.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Q1: Is lifetime a given property or a derived one,
>>>>>>>>>>>>>>>>>>>>>>
thus is it fixed
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>
modifiable?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
This is a question I asked myself a lot recently. I
>>>>>>>>>>>>>>>>>>>>>>
think it is
>>>>>>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>>>>>>>
and modifiable,
>>>>>>>>>>>>>>>>>>>>>>
at least I hope it is. Only that would allow
>>>>>>>>>>>>>>>>>>>>>>
transformations I would
>>>>>>>>>>>>>>>>>>>>
want
>>>>>>>>>>>>>>>>>>>>>>
us to do. Here are some examples:
>>>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Q2: Is a pointer comparison, or similar use,
>>>>>>>>>>>>>>>>>>>>>>
extending the lifetime?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Asked differently, can we hoist a pointer
>>>>>>>>>>>>>>>>>>>>>>
comparison into a region
>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>
the pointer is dead?
>>>>>>>>>>>>>>>>>>>>>>
This is an important question which we haven't
>>>>>>>>>>>>>>>>>>>>>>
discussed much as we
>>>>>>>>>>>>>>>>>>>>>>
assumed LICM has to work.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
The current behavior is that non-dereferencing uses
>>>>>>>>>>>>>>>>>>>>>>
are not
>>>>>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>>>>>
the lifetime and are
>>>>>>>>>>>>>>>>>>>>>>
allowed outside of "lifetime regions" (as indicated
>>>>>>>>>>>>>>>>>>>>>>
by markers).
>>>>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>>>>>
will always produce valid
>>>>>>>>>>>>>>>>>>>>>>
results. Though, we might want to think about a
>>>>>>>>>>>>>>>>>>>>>>
lifetime marker that
>>>>>>>>>>>>>>>>>>>>>>
spits out a new pointer
>>>>>>>>>>>>>>>>>>>>>>
value instead of reusing the old one.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Q3: Can we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
The question here is, can we fold address
>>>>>>>>>>>>>>>>>>>>>>
comparisons based on
>>>>>>>>>>>>>>>>>>>>>>
lifetimes, or not.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
So far, we fold comparisons based on "address
>>>>>>>>>>>>>>>>>>>>>>
information". For
>>>>>>>>>>>>>>>>>>
example,
>>>>>>>>>>>>>>>>>>>>>>
we "know" globals,
>>>>>>>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one
>>>>>>>>>>>>>>>>>>>>>>
another. Also, two
>>>>>>>>>>>>>>>>>>
distinct
>>>>>>>>>>>>>>>>>>>>>>
allocations, for globals
>>>>>>>>>>>>>>>>>>>>>>
and allocas, are considered unequal. Now, the crux
>>>>>>>>>>>>>>>>>>>>>>
is that we have
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>
consistent if we do two
>>>>>>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number
>>>>>>>>>>>>>>>>>>>>>>
missing). Since
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>
backend (or any other place
>>>>>>>>>>>>>>>>>>>>>>
for that matter) might coalesce allocas, their
>>>>>>>>>>>>>>>>>>>>>>
addresses might not
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>
different after all. If we
>>>>>>>>>>>>>>>>>>>>>>
already folded a comparison to "unequal" we are
>>>>>>>>>>>>>>>>>>>>>>
doomed if we later
>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>
a comparison that results
>>>>>>>>>>>>>>>>>>>>>>
in "equal". (Note, this is different from aliasing
>>>>>>>>>>>>>>>>>>>>>>
rules as they can
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>
stricter.)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Design:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I would hope we can come up with a model that
>>>>>>>>>>>>>>>>>>>>>>
treats memory "the
>>>>>>>>>>>>>>>>>>
same",
>>>>>>>>>>>>>>>>>>>>>>
regardless if it is global,
>>>>>>>>>>>>>>>>>>>>>>
stack, or heap. I want to avoid special casing one
>>>>>>>>>>>>>>>>>>>>>>
of them wrt.
>>>>>>>>>>>>>>>>>>
lifetime
>>>>>>>>>>>>>>>>>>>>>>
as I believe most optimizations
>>>>>>>>>>>>>>>>>>>>>>
would apply to any of them, potentially for
>>>>>>>>>>>>>>>>>>>>>>
different reasons and
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>
different gains, but nevertheless.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
A1: Lifetime is a concept that talks about memory
>>>>>>>>>>>>>>>>>>>>>>
content *only*.
>>>>>>>>>>>>>>>>>>>>>>
Basically, the memory content is set to
>>>>>>>>>>>>>>>>>>>>>>
undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>
just an "as-is" property of the memory
>>>>>>>>>>>>>>>>>>>>>>
content. The lifetimes of an object, as
>>>>>>>>>>>>>>>>>>>>>>
described by
>>>>>>>>>>>>>>>> markers,
>>>>>>>>>>>>>>>>>>>>
might
>>>>>>>>>>>>>>>>>>>>>>
change during the compilation. They
>>>>>>>>>>>>>>>>>>>>>>
might become smaller if we deduce the
>>>>>>>>>>>>>>>>>>>>>>
object is not accessed
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>
the memory content is not used, they
>>>>>>>>>>>>>>>>>>>>>>
might become larger if objects with
>>>>>>>>>>>>>>>>>>>>>>
non-overlapping
>>>>>>>>>>>>>>>> lifetimes
>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>>>>>>>
introducing a new object though.)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
A2: If we define lifetime as above, it has nothing
>>>>>>>>>>>>>>>>>>>>>>
to do with the
>>>>>>>>>>>>>>>>>>>>>>
address of an object. Consequently, pointer
>>>>>>>>>>>>>>>>>>>>>>
comparisons and similar operations are
>>>>>>>>>>>>>>>>>>>>>>
valid outside the
>>>>>>>>>>>>>>>>>>
lifetime.
>>>>>>>>>>>>>>>>>>>>>>
Loads and stores are as well, they can
>>>>>>>>>>>>>>>>>>>>>>
even not be removed "easily". A store
>>>>>>>>>>>>>>>>>>>>>>
followed by a lifetime
>>>>>>>>>>>>>>>>>>>>
marker
>>>>>>>>>>>>>>>>>>>>>>
or a load following a lifetime marker
>>>>>>>>>>>>>>>>>>>>>>
is dead or results in undef respectively.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
A3: We could not use lifetime to argue about
>>>>>>>>>>>>>>>>>>>>>>
addresses. This means
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>>>>>>
lifetimes result in different addresses.
>>>>>>>>>>>>>>>>>>>>>>
Thus, a comparison
>>>>>>>>>>>>>>>>>>>>
between
>>>>>>>>>>>>>>>>>>>>>>
the address of two allocas could not
>>>>>>>>>>>>>>>>>>>>>>
immediately be folded. That said, there
>>>>>>>>>>>>>>>>>>>>>>
would be special
>>>>>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>>>>>>
though. Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>>>>>>
not escape, other than the comparisons to
>>>>>>>>>>>>>>>>>>>>>>
be folded, we can
>>>>>>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>>>>>>>
them. Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>>>>>>
would still be consistent because it is
>>>>>>>>>>>>>>>>>>>>>>
unobservable. The
>>>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>>>>>
we have in-tree is that we fold even though
>>>>>>>>>>>>>>>>>>>>>>
the address is still observed (after the
>>>>>>>>>>>>>>>>>>>>>>
fold). It is still
>>>>>>>>>>>>>>>>>>>>
unclear
>>>>>>>>>>>>>>>>>>>>>>
to me what the impact of this would be
>>>>>>>>>>>>>>>>>>>>>>
on real code. I suggested before that we
>>>>>>>>>>>>>>>>>>>>>>
run some
>>>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>>>>>>
first
>>>>>>>>>>>>>>>>>>>>>>
before we make any decision whatsoever.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
This is pretty much saying that lifetime markers are
>>>>>>>>>>>>>>>>
`memset(undef)`,
>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>
you suggested before (I think).
>>>>>>>>>>>>>>>>>>>>>>
There are some implementation level differences but
>>>>>>>>>>>>>>>>>>>>>>
at the end of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>
day they are basically the same.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Happy to hear some thoughts on this, especially if
>>>>>>>>>>>>>>>>>>>>>>
it fixes the
>>>>>>>>>>>>>>>>>>
problems
>>>>>>>>>>>>>>>>>>>>>>
that lead to D93376 in the first place.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
~ Johannes
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On 12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>>>>>>>
Hello all,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of
>>>>>>>>>>>>>>>>>>>>>>>
lifetime.start/end
>>>>>>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>>>>>>>
here (
>>>>>>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>>>>>>> (syntactically
>>>>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
1. Is there a frontend or framework that
>>>>>>>>>>>>>>>>>>>>>>>
introduces lifetime call
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>>>>>>>>>>>>>>>>>>>>>>>
If exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
2. Can a block's lifetime be started bytewise or
>>>>>>>>>>>>>>>>>>>>>>>
elementwise?
>>>>>>>>>>>>>>>>>>>>>>>
I imagine an optimization that allow using stack
>>>>>>>>>>>>>>>>>>>>>>>
very compactly,
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>
wonder
>>>>>>>>>>>>>>>>>>>>>>>
there is a real-world use case.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>>>>>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>