On 1/24/21 10:07 AM, Ralf Jung wrote:> Hi Johannes,
>
>> At some point I lost track of what you even want to achieve.
>
> *lol* fair.^^
> All I want is a precise and meaningful spec for lifetime.start and
> lifetime.end. That's where we entered this rabbit hole. You didn't
> like Juneyoung's and Nuno's proposal for this and instead suggested
to
> replace the entire system by something else, and that new system is
> what we are discussing now (that is my understanding, anyway).
>
I went back to the proposal in question,
https://reviews.llvm.org/D93376, and I could not right away determine
why this would avoid the inconsistency we can imagine for something like
```
int *G = 0;
void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
void foo() {
intptr_t xp, yp;
{ int x; xp = &x; bar(&x); }
{ int y; yp = &y; bar(&y); }
print(xp == yp); // One can imagine this is folded to
false and because x and y are coalesced bar will print true.
}
```
I was opposing the proposal because of the brittle syntactic
restrictions. In my attempt to justify not having those I also tried to
resolve the above inconsistency in a more general manner because I
thought that was what it was all about. I believe that we either have to
argue the address of a dead object is unusable/undetermined, basically
the direction of James comment, or we need to decouple lifetime markers
from locations.
>> Your initial problem goes away if we only coalesce and fold
>> comparisons between allocations for which the addresses are not
>> observed.
>> Let's do that and thereby benefit from all the good things we can
get
>> from lifetime markers and coalescing, for alloca/malloc/...
>> Let's not go down the brittle road of syntactic limitations. It
>> doesn't allow us to do a lot of things and will come with a price
as
>> we continue.
>>
>> FWIW, I think some part of the misconception is that two allocas
>> "will definitely get different addresses" which you argue
"heavily
>> restricts the possible allocation choices compared to the original C
>> program".
>> The addresses are different but that is only interesting if you
>> observe them. Take the C code: `{int a,b, c = (&a == &b);
>> use(a,c);}`. I can happily coalesce `a` and `b` after the comparison
>> was folded, or `b` and `c` even if the comparison is still in place.
>> For neither transformation I need
>> lifetime markers to do it in IR.
>
> Sure, everything in the semantics is up to what the program actually
> observes. But I was considering the general case, in which it will not
> be possible to prove that some address is not observed. I expect that
> to be quite common (certainly, Rust programs really like to take
> references to local variables and pass them to other functions), and
> in those cases, not having lifetime.start/end will severely restrict
> program transformations -- as you observed, even outlining is affected.
>
> This is basically the abstract version of the concrete concern that
> James raised.
I get the argument that it can hinder optimizations but it is abstract.
It seems pretty simple to test this out, I mean to restrict the folding
and the coalescing. The benefit, which I think got ignored here a lot,
is that it provides a clear path towards deduction of coalescing
opportunities, that is, coalescing in the middle-end or backend w/o
lifetime markers. Though, similar to the impact of my proposal, I have
no idea if that is even needed.
~ Johannes
>
> Kind regards,
> Ralf
>
>> A lot of my arguments around alloca/malloc/... is that we can augment
>> the information from the frontend by determining if an address is
>> observed or not. If it is not, we can limit its allocation, coalesce
>> it, etc. Lifetime markers can help us do this but they are not the
>> only way.
>> So, basically, we do not need lifetime markers to coalesce
>> alloca/malloc/... but that they can help.
>
>
>
>>
>> Let's try to get out of this rabbit hole. If you have a problem
with
>> the proposal as outlined in
>> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html ,
>> please let me know. I assume dealing with observed addresses
>> different than with non-observede ones (implied by noescape) is the
>> way forward now.
>>
>> ~ Johannes
>>
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>>
>>>>>> Does that make some sense?
>>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ralf
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes
Doerfert
>>>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>>>>>> It is, alloca must have its
integral address decided at its
>>>>>>>>>>> *allocation*
>>>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>>>
>>>>>>>>>>> For example, if 0xFFFF0000 ~
0xFF000000 is already
>>>>>>>>>>> allocated, the new
>>>>>>>>>>> alloca cannot be placed there
unless it is not observed in
>>>>>>>>>>> the future,
>>>>>>>>>>> according to your proposal.
>>>>>>>>>>>
>>>>>>>>>>> But how do we know from the current
state whether the stack
>>>>>>>>>>> variable is
>>>>>>>>>>> going to be observed or not?
>>>>>>>>>> With that argument you could allocate
all but 4 bytes, do a
>>>>>>>>>> malloc(4)
>>>>>>>>>> and know the address of the returned
pointer (assuming it is
>>>>>>>>>> not null).
>>>>>>>>>>
>>>>>>>>>> What I try to say is, either your
scenario is part of the
>>>>>>>>>> model and
>>>>>>>>>> everything we do is broken as you could
"observe" addresses
>>>>>>>>>> passively,
>>>>>>>>>> *or*, the abstract machine we use for
semantic reasoning
>>>>>>>>>> doesn't permit
>>>>>>>>>> the above reasoning. I really hope for
the latter.
>>>>>>>>>>
>>>>>>>>> That's a very valid point..!
>>>>>>>>>
>>>>>>>>> Actually, I have a memory model that
addresses this problem.
>>>>>>>>> The gist is that we can interpret each
allocation instruction
>>>>>>>>> as *creating
>>>>>>>>> 2 blocks* and nondeterministically
returning one of them.
>>>>>>>>> The other one is marked as invalid but not
freed immediately.
>>>>>>>>> Deallocation
>>>>>>>>> frees both blocks.
>>>>>>>>> If there is not enough space for having two
blocks, it is
>>>>>>>>> treated as
>>>>>>>>> out-of-memory.
>>>>>>>>>
>>>>>>>>> It makes guessing the address of an
allocation according to
>>>>>>>>> memory layout
>>>>>>>>> invalid UB.
>>>>>>>>>
>>>>>>>>> p = malloc(4)
>>>>>>>>> // If p != null, it is guaranteed there was
at least two 4
>>>>>>>>> byte slots
>>>>>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>>>>>> // Two blocks are allocated at 0x110 and
0x114, and one of the
>>>>>>>>> addresses is
>>>>>>>>> nondeterministically returned.
>>>>>>>>> // By the nature of nondeterminism, all
executions below
>>>>>>>>> should be
>>>>>>>>> well-defined regardless of the address of
p.
>>>>>>>>> *(int*)0x100 = 10; // In an execution where
p is 0x110, this
>>>>>>>>> raises UB.
>>>>>>>>>
>>>>>>>>> The paper link is here
>>>>>>>>>
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>
>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>
>>>>>>>>
>>>>>>>>> To argue differently: Who is to say there
is a stack, or only
>>>>>>>>> one,
>>>>>>>>>> or that alloca allocates memory
"on the one stack"? That is
>>>>>>>>>> not part of
>>>>>>>>>> the IR, IMHO. I can write a machine on
which alloca lowers to
>>>>>>>>>> malloc,
>>>>>>>>>> I don't even need the free during
stack unwind but that I
>>>>>>>>>> could do as
>>>>>>>>>> well if I wanted to.
>>>>>>>>>
>>>>>>>>> This is right, the example was for
illustrative purposes. IR
>>>>>>>>> does not
>>>>>>>>> enforce alloca to be placed at
'stack'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Making icmp/ptrtoint yield poison
will still make loop
>>>>>>>>>>> versioning or
>>>>>>>>>>> pointer rewriting transformations
unsound because these
>>>>>>>>>>> operations now
>>>>>>>>>> can
>>>>>>>>>>> create poison (even if pointers are
noundef).
>>>>>>>>>> I did not say they yield poison, at
least I did not try to
>>>>>>>>>> say that.
>>>>>>>>>> What are you referring to exactly?
>>>>>>>>>>
>>>>>>>>> I was referring to this:
>>>>>>>>>
>>>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions,
>>>>>>>>>>> which
>>>>>>>>>>> contradicts with what LLVM does.
>>>>>>>>>> As with other violation of attributes I
would, on first
>>>>>>>>>> though, suggest
>>>>>>>>>> to produce poison, not UB.
>>>>>>>>> But it is more about the (imaginary)
attribute, so maybe I was
>>>>>>>>> slightly out
>>>>>>>>> of topic.
>>>>>>>>>
>>>>>>>>>> 2) is fine, I think the suggestion
semantically makes sense
>>>>>>>>>> perfectly. 1)
>>>>>>>>>>> is something I'm concerned
about now.
>>>>>>>>>>>
>>>>>>>>>>> There are more than pointer
foldings, such as rewriting such
>>>>>>>>>>> expression,
>>>>>>>>>>> code motion ptr cmp, introduce ptr
cmp, etc. There is also
>>>>>>>>>>> analysis
>>>>>>>>>> relying
>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>> Defining the correctness of each of
them is something we
>>>>>>>>>>> want to avoid,
>>>>>>>>>> and
>>>>>>>>>>> maybe that's why we want to
define precise semantics for
>>>>>>>>>>> things.
>>>>>>>>>> I don't get the point. My proposal
does not change the
>>>>>>>>>> semantics of
>>>>>>>>>> pointer comparisons, at all. I
explicitly mentioned that in
>>>>>>>>>> the last
>>>>>>>>>> email.
>>>>>>>>>>
>>>>>>>>> Oh okay, I thought it was a part of the
lifetime proposal, but
>>>>>>>>> it seems
>>>>>>>>> more like a separate thing.
>>>>>>>>> I agree that this requires performance
impact.
>>>>>>>>> Also investigation of existing
transformations would be
>>>>>>>>> needed; Alive2's
>>>>>>>>> pointer comparison is doing approximation
yet, but if it
>>>>>>>>> becomes fully
>>>>>>>>> precise, it will show something from
running LLVM unit tests I
>>>>>>>>> believe..! :)
>>>>>>>>>
>>>>>>>>>> I think this will be less aggressive
and may give nice
>>>>>>>>>> feedback to
>>>>>>>>>>> potential projects that are using
lifetime with non-alloca.
>>>>>>>>>> The lifetime marker debate, basically
2) above, is orthogonal
>>>>>>>>>> to the
>>>>>>>>>> problem
>>>>>>>>>> you try to solve. It got mixed in as
lifetime markers were
>>>>>>>>>> used by
>>>>>>>>>> StackColoring
>>>>>>>>>> to perform coalescing but that is
coincidental. You can
>>>>>>>>>> (arguably)
>>>>>>>>>> coalesce stack
>>>>>>>>>> allocations regardless of lifetime
markers and with 1) such a
>>>>>>>>>> transformation
>>>>>>>>>> (w/ and w/o lifetime markers) would
actually be sound.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> At the end, we can implement IR
writer that lowers lifetime
>>>>>>>>>>> with
>>>>>>>>>> non-alloca
>>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>>> Yeah, 2) is orthogonal and we can lower
it that way. Unsure
>>>>>>>>>> if it is
>>>>>>>>>> helpful
>>>>>>>>>> but we can certainly define it that way
in the LangRef.
>>>>>>>>>>
>>>>>>>>> Okay, thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Juneyoung
>>>>>>>>>>>
>>>>>>>>>>> p.s. The reply was late, sorry. I
think I can spend more
>>>>>>>>>>> time on this
>>>>>>>>>> today.
>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM
Johannes Doerfert <
>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 1/6/21 4:33 PM, Juneyoung
Lee wrote:
>>>>>>>>>>>>>>> Stepwisely defining
the semantics of instructions is a
>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>> I'm confused. What
in the proposal would prevent us from
>>>>>>>>>>>>>> defining
>>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an
>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I meant it would be great
if the output state after
>>>>>>>>>>>>> executing an
>>>>>>>>>>>>> instruction can be
described using its input state.
>>>>>>>>>>>>> (that was the meaning of
'stepwise semantics', I should
>>>>>>>>>>>>> have been more
>>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, the semantics
of 'z = add x y' can be defined
>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>> Given an input state s,
next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>> where s(x) is the value of
x in the previous state, and
>>>>>>>>>>>>> s[z -> v] is a
>>>>>>>>>>>>> state with z updated to v.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another example that
involves memory: the semantics of 'i
>>>>>>>>>>>>> = ptrtoint p'
>>>>>>>>>>>> can
>>>>>>>>>>>>> be defined as follows:
>>>>>>>>>>>>> Given an input state s,
next state s' = s[i ->
>>>>>>>>>>>>> s(p).obj.address +
>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>> where obj.address is the
begin address of a memory object
>>>>>>>>>>>>> obj pointed
>>>>>>>>>> by
>>>>>>>>>>>> p
>>>>>>>>>>>>> & offset is p's
byte offset. (Imagine a pointer to the
>>>>>>>>>>>>> offset of some
>>>>>>>>>>>> char
>>>>>>>>>>>>> array).
>>>>>>>>>>>>> Note that ptrtoint &
add can be nicely defined w.r.t the
>>>>>>>>>>>>> input state.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, the instruction that
we're facing is 'p = alloca'.
>>>>>>>>>>>>> To describe the output
state after executing 'p = alloca',
>>>>>>>>>>>>> the address
>>>>>>>>>> of
>>>>>>>>>>>>> new alloca should be
determined.
>>>>>>>>>>>>> If observedness is
involved, we need to know the future
>>>>>>>>>>>>> state again. :/
>>>>>>>>>>>> We
>>>>>>>>>>>>> don't know whether the
alloca is going to be observed or
>>>>>>>>>>>>> not without
>>>>>>>>>>>> seeing
>>>>>>>>>>>>> the future.
>>>>>>>>>>>>> This is the problem of the
current lifetime intrinsics as
>>>>>>>>>>>>> well.
>>>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>>>
>>>>>>>>>>>> Nobody proposed to modify the
semantics of `alloca`.
>>>>>>>>>>>> `alloca` provides you with a
fresh, unobserved block of
>>>>>>>>>>>> dereferenceable memory that is
implicitly freed as the stack
>>>>>>>>>>>> unwinds. That is it. No context
necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> If you want to modify the IR,
you need to argue the observable
>>>>>>>>>>>> semantics which is nothing new.
That this might require
>>>>>>>>>>>> more than
>>>>>>>>>>>> a peephole view of the program
is also not new.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> One possible approach to
resolve this is adding an
>>>>>>>>>>>>> 'unobserved' flag
to
>>>>>>>>>>>>> alloca instruction (similar
to what was suggested by
>>>>>>>>>>>>> Nicolai).
>>>>>>>>>>>>> And, we can say that if
alloca with 'unobserved' is used by
>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>> The flag can be added, like we
add other attributes. It
>>>>>>>>>>>> should not
>>>>>>>>>>>> be required for any
optimization we talked about though. It
>>>>>>>>>>>> basically
>>>>>>>>>>>> is a way to manifest derived or
given information into the IR.
>>>>>>>>>>>>
>>>>>>>>>>>> Attribute deduction, as well as
frontends with domain
>>>>>>>>>>>> knowledge,
>>>>>>>>>>>> can add such information. The
flag we discussed in phab was
>>>>>>>>>>>> not even
>>>>>>>>>>>> sufficient for all the
transformation examples I presented
>>>>>>>>>>>> in my mail,
>>>>>>>>>>>> that is why I extended my
argument. We could still have a
>>>>>>>>>>>> "noescape"
>>>>>>>>>>>> flag for allocas, but I'm
not sure how useful that really
>>>>>>>>>>>> is. We can
>>>>>>>>>>>> certainly deduce it and
manifest it, unsure if we have
>>>>>>>>>>>> domain knowledge
>>>>>>>>>>>> we can use for non-trivial
cases though.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> contradicts with what LLVM
does.
>>>>>>>>>>>> As with other violation of
attributes I would, on first
>>>>>>>>>>>> though, suggest
>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, existing
optimizations like loop versioning can
>>>>>>>>>>>>> introduce
>>>>>>>>>>>>> ptrtoint/pointer
comparisons too.
>>>>>>>>>>>> Sure. I am not certain why that
is a problem. I get the
>>>>>>>>>>>> feeling things
>>>>>>>>>>>> are still mixed up here.
>>>>>>>>>>>>
>>>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) We stop folding comparisons
between different allocas if
>>>>>>>>>>>> changing the
>>>>>>>>>>>> address
>>>>>>>>>>>> of both might be
observable. Thus, if both might have
>>>>>>>>>>>> their address
>>>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>>> other than the
comparisons we want to fold, we cannot
>>>>>>>>>>>> proceed.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) We define lifetime markers
to mean `memset(undef)`.
>>>>>>>>>>>>
>>>>>>>>>>>> The first should be sufficient
for the problem you were
>>>>>>>>>>>> trying to solve
>>>>>>>>>>>> in the
>>>>>>>>>>>> first place. The second makes
lifetime markers less weird.
>>>>>>>>>>>> Note that 1)
>>>>>>>>>>>> is not changing
>>>>>>>>>>>> the semantics of the IR. We
basically just argue there is a
>>>>>>>>>>>> bug in our
>>>>>>>>>>>> instcombine right
>>>>>>>>>>>> now as we do not check all
necessary preconditions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I see that there are other
questions that I didn't answer
>>>>>>>>>>>>> yet, but let
>>>>>>>>>> me
>>>>>>>>>>>>> answer this first to limit
the length of the text :)
>>>>>>>>>>>> Sure, we can split the
discussion :)
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 3:36
AM Johannes Doerfert <
>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/5/21 8:00 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I read your
proposal and thought about the model.
>>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As you concerned in
A3, certain programs may be valid
>>>>>>>>>>>>>>> only when
>>>>>>>>>> memory
>>>>>>>>>>>>>>> blocks with
overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>> Look at this
example (I'm using malloc, but alloca also
>>>>>>>>>>>>>>> works):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>>>> p2 = malloc(4) //
for brevity, assume that there as
>>>>>>>>>>>>>>> enough space &
p1
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>>> set<char*> s;
>>>>>>>>>>>>>>> s.insert(p1);
s.insert(p2); // If the second insert did
>>>>>>>>>>>>>>> nothing, it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be surprise to
programmers
>>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Clearly, IR
semantics should guarantee that escaped
>>>>>>>>>>>>>>> blocks are
>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> would be great for
verification tools on LLVM IR to be
>>>>>>>>>>>>>>> able to answer
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the second insert
will succeed.
>>>>>>>>>>>>>> I agree, the second
insert should succeed, assuming `p1
>>>>>>>>>>>>>> && p2`.
>>>>>>>>>>>>>> I don't think my
proposal would in any way impact the
>>>>>>>>>>>>>> program above,
>>>>>>>>>>>>>> if anything the A3
reasoning makes sure such a program
>>>>>>>>>>>>>> with allocas
>>>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm also not sure I
understand what you try to argue for.
>>>>>>>>>>>>>> Maybe
>>>>>>>>>>>>>> elaborate a bit what it
is you think is bad or needs to
>>>>>>>>>>>>>> be changed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is that
definition of escapedness is not
>>>>>>>>>>>>>>> clear at the
>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>> level. Describing
the IR semantics w.r.t. LLVM's escaped
>>>>>>>>>>>>>>> analysis
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> isn't something
we want.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Semantic definition
of escapedness of a pointer seems
>>>>>>>>>>>>>>> hard, I mean
>>>>>>>>>> in a
>>>>>>>>>>>>>>> stepwise manner.
>>>>>>>>>>>>>>> It isn't a
single instruction such as 'escape i8* ptr',
>>>>>>>>>>>>>>> and we need
>>>>>>>>>> to
>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>> over all
instructions in the function. For example,
>>>>>>>>>>>>>>> '(int)(p+1) -
>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>> isn't
semantically escaping the pointer p because the
>>>>>>>>>>>>>>> result is 1
>>>>>>>>>>>>>>> regardless of the
value of p.
>>>>>>>>>>>>>>> Stepwisely defining
the semantics of instructions is a
>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>> I'm confused. What
in the proposal would prevent us from
>>>>>>>>>>>>>> defining
>>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an
>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In practice,
syntactically checking escapedness + nice
>>>>>>>>>>>>>>> engineering
>>>>>>>>>>>> might
>>>>>>>>>>>>>>> not break
optimizations in most cases (as undef/poison
>>>>>>>>>>>>>>> did); but it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be great to move to
another level, since LLVM IR is used
>>>>>>>>>>>>>>> in so many
>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>> The property under
which you can coalesce objects is simple:
>>>>>>>>>>>>>> It is not
observable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, if the address of
one of the two objects you
>>>>>>>>>>>>>> coalesce is not
>>>>>>>>>>>>>> observed, coalescing is
not observable. That is a
>>>>>>>>>>>>>> sufficient condition
>>>>>>>>>>>>>> not a necessary one.
Pointer "escaping" is one step
>>>>>>>>>>>>>> further. If the
>>>>>>>>>>>>>> address doesn't
escape it is not observed. This does not
>>>>>>>>>>>>>> mean the
>>>>>>>>>>>>>> "semantic
conditions for coalescing", e.g., for the
>>>>>>>>>>>>>> purpose of
>>>>>>>>>>>> translation
>>>>>>>>>>>>>> validation, is supposed
to be build on top of our
>>>>>>>>>>>>>> "definition of
>>>>>>>>>>>> escaping
>>>>>>>>>>>>>> pointers". That
said, we use "does not escape" as a
>>>>>>>>>>>>>> precondition for
>>>>>>>>>>>>>> various transformation
and I'm unsure what is any
>>>>>>>>>>>>>> different now. The
>>>>>>>>>>>>>> entire escape argument
is only used in the validity of
>>>>>>>>>>>>>> the pointer
>>>>>>>>>>>> folding.
>>>>>>>>>>>>>> Similarly, we can fold
a comparison of a noalias pointer
>>>>>>>>>>>>>> with another
>>>>>>>>>>>> one
>>>>>>>>>>>>>> if the former does not
escape (and both are dereferenced
>>>>>>>>>>>>>> and one is
>>>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The pointer
comparison is another beast. It indeed has a
>>>>>>>>>>>>>>> few issues,
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> solving it might
require nontrivial solution.
>>>>>>>>>>>>>> I think the main
problem of the inconsistencies and such
>>>>>>>>>>>>>> we've seen is
>>>>>>>>>>>>>> rooted in the erroneous
folding of pointer comparisons.
>>>>>>>>>>>>>> Cleaning up
>>>>>>>>>> the
>>>>>>>>>>>>>> lifetime marker
semantics is actually unrelated and
>>>>>>>>>>>>>> simply not folding
>>>>>>>>>>>>>> as described in A3
should solve the issue that has been
>>>>>>>>>>>>>> reported.
>>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>>> we should take a crack
at lifetime markers while we are
>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2021
at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy new year
:)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After we had a
lengthy discussion on phab last year,
>>>>>>>>>>>>>>>> I've tried
to
>>>>>>>>>>>>>>>> summarize my
thoughts,
>>>>>>>>>>>>>>>> especially
given that I had some time to think about
>>>>>>>>>>>>>>>> things over the
>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>> Still, no
promises on the quality ;)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I start with
general questions I asked myself and then
>>>>>>>>>>>>>>>> go on
>>>>>>>>>> rambling
>>>>>>>>>>>>>>>> about a
potential design.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q1: Is lifetime
a given property or a derived one, thus
>>>>>>>>>>>>>>>> is it fixed
>>>>>>>>>> or
>>>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a
question I asked myself a lot recently. I
>>>>>>>>>>>>>>>> think it is
>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>>>>>> at least I hope
it is. Only that would allow
>>>>>>>>>>>>>>>> transformations
I would
>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>> us to do. Here
are some examples:
>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q2: Is a
pointer comparison, or similar use, extending
>>>>>>>>>>>>>>>> the lifetime?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Asked
differently, can we hoist a pointer comparison
>>>>>>>>>>>>>>>> into a region
>>>>>>>>>>>> where
>>>>>>>>>>>>>>>> the pointer is
dead?
>>>>>>>>>>>>>>>> This is an
important question which we haven't
>>>>>>>>>>>>>>>> discussed much
as we
>>>>>>>>>>>>>>>> assumed LICM
has to work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The current
behavior is that non-dereferencing uses are
>>>>>>>>>>>>>>>> not
>>>>>>>>>> extending
>>>>>>>>>>>>>>>> the lifetime
and are
>>>>>>>>>>>>>>>> allowed outside
of "lifetime regions" (as indicated by
>>>>>>>>>>>>>>>> markers).
>>>>>>>>>> They
>>>>>>>>>>>>>>>> will always
produce valid
>>>>>>>>>>>>>>>> results.
Though, we might want to think about a
>>>>>>>>>>>>>>>> lifetime marker
that
>>>>>>>>>>>>>>>> spits out a new
pointer
>>>>>>>>>>>>>>>> value instead
of reusing the old one.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Q3: Can we use
lifetime to argue about addresses?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The question
here is, can we fold address comparisons
>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>> lifetimes, or
not.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So far, we fold
comparisons based on "address
>>>>>>>>>>>>>>>>
information". For
>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>> we
"know" globals,
>>>>>>>>>>>>>>>> allocas, and
mallocs cannot be equal to one another.
>>>>>>>>>>>>>>>> Also, two
>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>> allocations,
for globals
>>>>>>>>>>>>>>>> and allocas,
are considered unequal. Now, the crux is
>>>>>>>>>>>>>>>> that we have
>>>>>>>>>> to
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> consistent if
we do two
>>>>>>>>>>>>>>>> comparisons,
and, as of now, we are not (bug number
>>>>>>>>>>>>>>>> missing). Since
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> backend (or any
other place
>>>>>>>>>>>>>>>> for that
matter) might coalesce allocas, their
>>>>>>>>>>>>>>>> addresses might
not
>>>>>>>>>> be
>>>>>>>>>>>>>>>> different after
all. If we
>>>>>>>>>>>>>>>> already folded
a comparison to "unequal" we are doomed
>>>>>>>>>>>>>>>> if we later
>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> a comparison
that results
>>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing
>>>>>>>>>>>>>>>> rules as they
can
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would hope we
can come up with a model that treats
>>>>>>>>>>>>>>>> memory
"the
>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>> regardless if
it is global,
>>>>>>>>>>>>>>>> stack, or heap.
I want to avoid special casing one of
>>>>>>>>>>>>>>>> them wrt.
>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>> as I believe
most optimizations
>>>>>>>>>>>>>>>> would apply to
any of them, potentially for different
>>>>>>>>>>>>>>>> reasons and
>>>>>>>>>> with
>>>>>>>>>>>>>>>> different
gains, but nevertheless.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Proposal
(largely based on the conversation in phab):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A1: Lifetime is
a concept that talks about memory
>>>>>>>>>>>>>>>> content *only*.
>>>>>>>>>>>>>>>> Basically, the
memory content is set to
>>>>>>>>>>>>>>>>
undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>> it
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> just an
"as-is" property of the memory
>>>>>>>>>>>>>>>>
content. The lifetimes of an object, as
>>>>>>>>>>>>>>>> described by
>>>>>>>>>> markers,
>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>> change during
the compilation. They
>>>>>>>>>>>>>>>> might
become smaller if we deduce the object
>>>>>>>>>>>>>>>> is not accessed
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> the memory
content is not used, they
>>>>>>>>>>>>>>>> might
become larger if objects with
>>>>>>>>>>>>>>>> non-overlapping
>>>>>>>>>> lifetimes
>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> coalesced. (One
could see the latter as
>>>>>>>>>>>>>>>>
introducing a new object though.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A2: If we
define lifetime as above, it has nothing to
>>>>>>>>>>>>>>>> do with the
>>>>>>>>>>>>>>>> address of an
object. Consequently, pointer
>>>>>>>>>>>>>>>>
comparisons and similar operations are valid
>>>>>>>>>>>>>>>> outside the
>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>> Loads and
stores are as well, they can
>>>>>>>>>>>>>>>> even
not be removed "easily". A store followed
>>>>>>>>>>>>>>>> by a lifetime
>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>> or a load
following a lifetime marker
>>>>>>>>>>>>>>>> is
dead or results in undef respectively.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A3: We could
not use lifetime to argue about addresses.
>>>>>>>>>>>>>>>> This means
>>>>>>>>>> we
>>>>>>>>>>>>>>>> could/should
also not argue that overlapping
>>>>>>>>>>>>>>>>
lifetimes result in different addresses. Thus,
>>>>>>>>>>>>>>>> a comparison
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>> the address of
two allocas could not
>>>>>>>>>>>>>>>>
immediately be folded. That said, there would
>>>>>>>>>>>>>>>> be special
>>>>>>>>>> cases
>>>>>>>>>>>>>>>> though.
Basically, if one of the allocas does
>>>>>>>>>>>>>>>> not
escape, other than the comparisons to be
>>>>>>>>>>>>>>>> folded, we can
>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>> them.
Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>> would
still be consistent because it is
>>>>>>>>>>>>>>>> unobservable.
The
>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> we have in-tree
is that we fold even though
>>>>>>>>>>>>>>>> the
address is still observed (after the
>>>>>>>>>>>>>>>> fold). It is
still
>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>> to me what the
impact of this would be
>>>>>>>>>>>>>>>> on
real code. I suggested before that we run some
>>>>>>>>>> experiments
>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>> before we make
any decision whatsoever.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is pretty
much saying that lifetime markers are
>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> you suggested
before (I think).
>>>>>>>>>>>>>>>> There are some
implementation level differences but at
>>>>>>>>>>>>>>>> the end of
>>>>>>>>>> the
>>>>>>>>>>>>>>>> day they are
basically the same.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Happy to hear
some thoughts on this, especially if it
>>>>>>>>>>>>>>>> fixes the
>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>> that lead to
D93376 in the first place.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 12/18/20
2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We're
discussing the well-formedness of
>>>>>>>>>>>>>>>>>
lifetime.start/end
>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>> (syntactically
>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd
like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Is there
a frontend or framework that introduces
>>>>>>>>>>>>>>>>> lifetime
call
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> non-stack
allocated objects?
>>>>>>>>>>>>>>>>> If exists,
which behavior do they expect from it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. Can a
block's lifetime be started bytewise or
>>>>>>>>>>>>>>>>>
elementwise?
>>>>>>>>>>>>>>>>> I imagine
an optimization that allow using stack very
>>>>>>>>>>>>>>>>> compactly,
>>>>>>>>>> but
>>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>>> there is a
real-world use case.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>> LLVM
Developers mailing list
>>>>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>