On 1/11/21 9:48 AM, Ralf Jung wrote:> Hi Johannes,
>
>>>> A3: We could not use lifetime to argue about addresses. This
means
>>>> we could/should also not argue that overlapping
>>>> lifetimes result in different addresses. Thus, a
comparison
>>>> between the address of two allocas could not
>>>> immediately be folded. That said, there would be special
cases
>>>> though. Basically, if one of the allocas does
>>>> not escape, other than the comparisons to be folded, we
can
>>>> fold them. Afterwards, coalescing or splitting
>>>> would still be consistent because it is unobservable. The
>>>> problem we have in-tree is that we fold even though
>>>> the address is still observed (after the fold). It is
still
>>>> unclear to me what the impact of this would be
>>>> on real code. I suggested before that we run some
experiments
>>>> first before we make any decision whatsoever.
>>>
>>> I hope this question has not been answered yet, but I don't see
how
>>> that fold could be legal. I asked the same question on Phabricator
>>> but it seems you prefer to have the discussion here. Taking your
>>> example from there and adjusting it:
>>>
>>> p = malloc(1)
>>> q = malloc(1)
>>> %c = icmp eq %p, %q
>>> free(q)
>>> free(p)
>>>
>>> I think there is a guarantee that c will always be
"false". Every
>>> operational model of allocation that I have ever seen will
guarantee
>>> this, and the same for every program logic that I can imagine. So
if
>>> the compiler may fold this to "false", then as far as I
can see,
>>> pointer comparison becomes entirely unpredictable. The only
>>> consistent model I can think of is "icmp on pointers may
spuriously
>>> return 'true' at any time", which I doubt anyone
wants. ;)
>>>
>>> Am I misunderstanding what you are proposing here?
>>
>> I'm confused. In your text I read: It should be "false"
but *not*
>> folded to "false" or else ...
>
> *oops* sorry for that, I meant "if the compiler may fold this to
'true'".
>
>> FWIW, I would expect `%c = false` here.
>
> Good, we agree then.
>
>> The tricky part is, do we see `%c = false` at compile time
>> or at runtime. If it's the former, we might run into the same
>> coalescing issue we have for allocas:
>>
>> p = malloc(1)
>> free(q)
>> q = malloc(1)
>> free(p)
>> %c = icmp eq %p, %q
>>
>> What is `%c` now? It could go either way, depending on your
>> allocator, agreed?
>
> Agreed.
>
>> If so, we need to ensure
>> there is not a second `%c2 = icmp eq %p, %q` somewhere if we want to
>> fold `%c` as it needs to be consistent.
>
> Indeed -- if we constrain the non-deterministic choice that the
> allocator is making, we need to do so consistently.
>
> On Phabricator, you wrote
>
>> We could instead coalesce allocas in the middle end (more
>> aggressively and not only based on lifetime markers) in order to fold
>> comparisons of the "then equal" allocas to "equal".
>
> So, to me this sounded like you wanted to fold `icmp` to `true` in
> situations like in my example. It looks like I misunderstood.
> So is your proposal instead that we can coalesce allocations *after*
> we have ensured that there definitely is no `icmp` on their pointer
> any more? Like, for local variables, we first fold away all `icmp` and
> then we can freely coalesce since now it's not observable that we are
> reusing memory?
Let me start with:
My phab response is "stale" by now. My first email to this thread was
written after I had more time to think about all this:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
Separation of concerns, yes. If you can't observe we coalesced, we can
coalesce, or choose not to. My main argument is that the folding we do
right now is broken, or the coalescing is. We cannot fold, coalesce, and
then fold again (which could be runtime eval), as that leads to
inconsistencies.
>
> However, if we translate alloca to malloc+free, all (almost all?) of
> these allocations *will* overlap in their lifetime, so 'false'
> (inequal) is the only thing we could ever fold anything to (and we
> don't have to worry about consistency here since this is the only
> possible choice). I guess after we did that, if there's nothing else
> observing these addresses, we could indeed actually reuse the
> underlying storage...
>
> Am I on the right track?
I guess initially we would only fold to `inequal`, which is fine, until
we start to coalesce locations with non-overlapping lifetime early.
Though, now that I said this, I'm not even sure we could ever fold after
coalescing, thus we could ever coalesce if the addresses (of both)
are "still observed". Maybe folding is not the problem but coalescing
is, though I think both are in the most general sense. I think I can
fabricate a reasonable flow of transformation in which early *and*
partial folding is bad:
a = alloc
b = alloc
... // no use of a or b
foo(a)
... // no use of a or b
bar(b)
... // no use of a or b
c = (a == b)
now we assume that the two calls are outlined after we fold `(a == b)`
to `false`:
void foocall() {
a = alloc
foo(a)
}
void barcall() {
a = alloc
bar(a)
}
foocall()
barcall()
c = true
now we assume foo stores away `a` and bar compares it to `b`. That would
be a problem.
WDYT?
>
> Kind regards,
> Ralf
>
>>
>> Does that make some sense?
>>
>> ~ Johannes
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>>
>>>>
>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes Doerfert
>>>>> <johannesdoerfert at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee wrote:
>>>>>>> It is, alloca must have its integral address
decided at its
>>>>>>> *allocation*
>>>>>>> time as well, like 0xFF00... .
>>>>>>>
>>>>>>> For example, if 0xFFFF0000 ~ 0xFF000000 is already
allocated,
>>>>>>> the new
>>>>>>> alloca cannot be placed there unless it is not
observed in the
>>>>>>> future,
>>>>>>> according to your proposal.
>>>>>>>
>>>>>>> But how do we know from the current state whether
the stack
>>>>>>> variable is
>>>>>>> going to be observed or not?
>>>>>> With that argument you could allocate all but 4 bytes,
do a
>>>>>> malloc(4)
>>>>>> and know the address of the returned pointer (assuming
it is not
>>>>>> null).
>>>>>>
>>>>>> What I try to say is, either your scenario is part of
the model and
>>>>>> everything we do is broken as you could
"observe" addresses
>>>>>> passively,
>>>>>> *or*, the abstract machine we use for semantic
reasoning doesn't
>>>>>> permit
>>>>>> the above reasoning. I really hope for the latter.
>>>>>>
>>>>> That's a very valid point..!
>>>>>
>>>>> Actually, I have a memory model that addresses this
problem.
>>>>> The gist is that we can interpret each allocation
instruction as
>>>>> *creating
>>>>> 2 blocks* and nondeterministically returning one of them.
>>>>> The other one is marked as invalid but not freed
immediately.
>>>>> Deallocation
>>>>> frees both blocks.
>>>>> If there is not enough space for having two blocks, it is
treated as
>>>>> out-of-memory.
>>>>>
>>>>> It makes guessing the address of an allocation according to
memory
>>>>> layout
>>>>> invalid UB.
>>>>>
>>>>> p = malloc(4)
>>>>> // If p != null, it is guaranteed there was at least two 4
byte slots
>>>>> available, say 0x100~0x104 and 0x110~0x114.
>>>>> // Two blocks are allocated at 0x110 and 0x114, and one of
the
>>>>> addresses is
>>>>> nondeterministically returned.
>>>>> // By the nature of nondeterminism, all executions below
should be
>>>>> well-defined regardless of the address of p.
>>>>> *(int*)0x100 = 10; // In an execution where p is 0x110,
this
>>>>> raises UB.
>>>>>
>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>
>>>> Nice! Thanks for the link :)
>>>>
>>>>
>>>>> To argue differently: Who is to say there is a stack, or
only one,
>>>>>> or that alloca allocates memory "on the one
stack"? That is not
>>>>>> part of
>>>>>> the IR, IMHO. I can write a machine on which alloca
lowers to
>>>>>> malloc,
>>>>>> I don't even need the free during stack unwind but
that I could
>>>>>> do as
>>>>>> well if I wanted to.
>>>>>
>>>>> This is right, the example was for illustrative purposes.
IR does not
>>>>> enforce alloca to be placed at 'stack'.
>>>>>
>>>>>
>>>>>>> Making icmp/ptrtoint yield poison will still make
loop
>>>>>>> versioning or
>>>>>>> pointer rewriting transformations unsound because
these
>>>>>>> operations now
>>>>>> can
>>>>>>> create poison (even if pointers are noundef).
>>>>>> I did not say they yield poison, at least I did not try
to say that.
>>>>>> What are you referring to exactly?
>>>>>>
>>>>> I was referring to this:
>>>>>
>>>>>>> But this makes ptrtoint/icmp make UB-raising
instructions, which
>>>>>>> contradicts with what LLVM does.
>>>>>> As with other violation of attributes I would, on first
though,
>>>>>> suggest
>>>>>> to produce poison, not UB.
>>>>> But it is more about the (imaginary) attribute, so maybe I
was
>>>>> slightly out
>>>>> of topic.
>>>>>
>>>>>> 2) is fine, I think the suggestion semantically makes
sense
>>>>>> perfectly. 1)
>>>>>>> is something I'm concerned about now.
>>>>>>>
>>>>>>> There are more than pointer foldings, such as
rewriting such
>>>>>>> expression,
>>>>>>> code motion ptr cmp, introduce ptr cmp, etc. There
is also analysis
>>>>>> relying
>>>>>>> on ptr cmp.
>>>>>>> Defining the correctness of each of them is
something we want to
>>>>>>> avoid,
>>>>>> and
>>>>>>> maybe that's why we want to define precise
semantics for things.
>>>>>> I don't get the point. My proposal does not change
the semantics of
>>>>>> pointer comparisons, at all. I explicitly mentioned
that in the last
>>>>>> email.
>>>>>>
>>>>> Oh okay, I thought it was a part of the lifetime proposal,
but it
>>>>> seems
>>>>> more like a separate thing.
>>>>> I agree that this requires performance impact.
>>>>> Also investigation of existing transformations would be
needed;
>>>>> Alive2's
>>>>> pointer comparison is doing approximation yet, but if it
becomes
>>>>> fully
>>>>> precise, it will show something from running LLVM unit
tests I
>>>>> believe..! :)
>>>>>
>>>>>> I think this will be less aggressive and may give nice
feedback to
>>>>>>> potential projects that are using lifetime with
non-alloca.
>>>>>> The lifetime marker debate, basically 2) above, is
orthogonal to the
>>>>>> problem
>>>>>> you try to solve. It got mixed in as lifetime markers
were used by
>>>>>> StackColoring
>>>>>> to perform coalescing but that is coincidental. You can
(arguably)
>>>>>> coalesce stack
>>>>>> allocations regardless of lifetime markers and with 1)
such a
>>>>>> transformation
>>>>>> (w/ and w/o lifetime markers) would actually be sound.
>>>>>>
>>>>>>
>>>>>>> At the end, we can implement IR writer that lowers
lifetime with
>>>>>> non-alloca
>>>>>>> into memset(undef). WDYT?
>>>>>> Yeah, 2) is orthogonal and we can lower it that way.
Unsure if it is
>>>>>> helpful
>>>>>> but we can certainly define it that way in the LangRef.
>>>>>>
>>>>> Okay, thanks!
>>>>>
>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Juneyoung
>>>>>>>
>>>>>>> p.s. The reply was late, sorry. I think I can spend
more time on
>>>>>>> this
>>>>>> today.
>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM Johannes Doerfert
<
>>>>>> johannesdoerfert at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 1/6/21 4:33 PM, Juneyoung Lee wrote:
>>>>>>>>>>> Stepwisely defining the semantics
of instructions is a
>>>>>>>>>>> desirable
>>>>>>>>> direction
>>>>>>>>>>> IMO.
>>>>>>>>>> I'm confused. What in the proposal
would prevent us from
>>>>>>>>>> defining
>>>>>>>>>> the semantics of instructions, or force
us to do it in an
>>>>>>>>>> "undesirable
>>>>>>>>> way"?
>>>>>>>>>
>>>>>>>>> I meant it would be great if the output
state after executing an
>>>>>>>>> instruction can be described using its
input state.
>>>>>>>>> (that was the meaning of 'stepwise
semantics', I should have
>>>>>>>>> been more
>>>>>>>>> clear about this)
>>>>>>>>>
>>>>>>>>> For example, the semantics of 'z = add
x y' can be defined as
>>>>>>>>> follows:
>>>>>>>>> Given an input state s, next state s' =
s[z -> s(x) + s(y)]
>>>>>>>>> where s(x) is the value of x in the
previous state, and s[z ->
>>>>>>>>> v] is a
>>>>>>>>> state with z updated to v.
>>>>>>>>>
>>>>>>>>> Another example that involves memory: the
semantics of 'i =
>>>>>>>>> ptrtoint p'
>>>>>>>> can
>>>>>>>>> be defined as follows:
>>>>>>>>> Given an input state s, next state s' =
s[i -> s(p).obj.address +
>>>>>>>>> s(p).offset]
>>>>>>>>> where obj.address is the begin address of a
memory object obj
>>>>>>>>> pointed
>>>>>> by
>>>>>>>> p
>>>>>>>>> & offset is p's byte offset.
(Imagine a pointer to the offset
>>>>>>>>> of some
>>>>>>>> char
>>>>>>>>> array).
>>>>>>>>> Note that ptrtoint & add can be nicely
defined w.r.t the input
>>>>>>>>> state.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now, the instruction that we're facing
is 'p = alloca'.
>>>>>>>>> To describe the output state after
executing 'p = alloca', the
>>>>>>>>> address
>>>>>> of
>>>>>>>>> new alloca should be determined.
>>>>>>>>> If observedness is involved, we need to
know the future state
>>>>>>>>> again. :/
>>>>>>>> We
>>>>>>>>> don't know whether the alloca is going
to be observed or not
>>>>>>>>> without
>>>>>>>> seeing
>>>>>>>>> the future.
>>>>>>>>> This is the problem of the current lifetime
intrinsics as well.
>>>>>>>> No, you mix things up here.
>>>>>>>>
>>>>>>>> Nobody proposed to modify the semantics of
`alloca`.
>>>>>>>> `alloca` provides you with a fresh, unobserved
block of
>>>>>>>> dereferenceable memory that is implicitly freed
as the stack
>>>>>>>> unwinds. That is it. No context necessary.
>>>>>>>>
>>>>>>>> If you want to modify the IR, you need to argue
the observable
>>>>>>>> semantics which is nothing new. That this might
require more than
>>>>>>>> a peephole view of the program is also not new.
>>>>>>>>
>>>>>>>>
>>>>>>>>> One possible approach to resolve this is
adding an
>>>>>>>>> 'unobserved' flag to
>>>>>>>>> alloca instruction (similar to what was
suggested by Nicolai).
>>>>>>>>> And, we can say that if alloca with
'unobserved' is used by
>>>>>>>> ptrtoint/icmp,
>>>>>>>>> it is UB.
>>>>>>>> The flag can be added, like we add other
attributes. It should not
>>>>>>>> be required for any optimization we talked
about though. It
>>>>>>>> basically
>>>>>>>> is a way to manifest derived or given
information into the IR.
>>>>>>>>
>>>>>>>> Attribute deduction, as well as frontends with
domain knowledge,
>>>>>>>> can add such information. The flag we discussed
in phab was not
>>>>>>>> even
>>>>>>>> sufficient for all the transformation examples
I presented in
>>>>>>>> my mail,
>>>>>>>> that is why I extended my argument. We could
still have a
>>>>>>>> "noescape"
>>>>>>>> flag for allocas, but I'm not sure how
useful that really is.
>>>>>>>> We can
>>>>>>>> certainly deduce it and manifest it, unsure if
we have domain
>>>>>>>> knowledge
>>>>>>>> we can use for non-trivial cases though.
>>>>>>>>
>>>>>>>>
>>>>>>>>> But this makes ptrtoint/icmp make
UB-raising instructions, which
>>>>>>>>> contradicts with what LLVM does.
>>>>>>>> As with other violation of attributes I would,
on first though,
>>>>>>>> suggest
>>>>>>>> to produce poison, not UB.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Also, existing optimizations like loop
versioning can introduce
>>>>>>>>> ptrtoint/pointer comparisons too.
>>>>>>>> Sure. I am not certain why that is a problem. I
get the feeling
>>>>>>>> things
>>>>>>>> are still mixed up here.
>>>>>>>>
>>>>>>>> What I proposed is twofold:
>>>>>>>>
>>>>>>>> 1) We stop folding comparisons between
different allocas if
>>>>>>>> changing the
>>>>>>>> address
>>>>>>>> of both might be observable. Thus, if
both might have
>>>>>>>> their address
>>>>>>>> "taken"/escaped,
>>>>>>>> other than the comparisons we want to
fold, we cannot
>>>>>>>> proceed.
>>>>>>>>
>>>>>>>> 2) We define lifetime markers to mean
`memset(undef)`.
>>>>>>>>
>>>>>>>> The first should be sufficient for the problem
you were trying
>>>>>>>> to solve
>>>>>>>> in the
>>>>>>>> first place. The second makes lifetime markers
less weird. Note
>>>>>>>> that 1)
>>>>>>>> is not changing
>>>>>>>> the semantics of the IR. We basically just
argue there is a bug
>>>>>>>> in our
>>>>>>>> instcombine right
>>>>>>>> now as we do not check all necessary
preconditions.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I see that there are other questions that I
didn't answer yet,
>>>>>>>>> but let
>>>>>> me
>>>>>>>>> answer this first to limit the length of
the text :)
>>>>>>>> Sure, we can split the discussion :)
>>>>>>>>
>>>>>>>> ~ Johannes
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Juneyoung
>>>>>>>>>
>>>>>>>>> On Thu, Jan 7, 2021 at 3:36 AM Johannes
Doerfert <
>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On 1/5/21 8:00 PM, Juneyoung Lee wrote:
>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>
>>>>>>>>>>> I read your proposal and thought
about the model.
>>>>>>>>>> Cool, thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> As you concerned in A3, certain
programs may be valid only when
>>>>>> memory
>>>>>>>>>>> blocks with overlapping lifetimes
have disjoint addresses.
>>>>>>>>>>> Look at this example (I'm using
malloc, but alloca also works):
>>>>>>>>>>>
>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>> p2 = malloc(4) // for brevity,
assume that there as enough
>>>>>>>>>>> space & p1
>>>>>>>> and
>>>>>>>>>>> p2 != null
>>>>>>>>>>> set<char*> s;
>>>>>>>>>>> s.insert(p1); s.insert(p2); // If
the second insert did
>>>>>>>>>>> nothing, it
>>>>>>>> would
>>>>>>>>>>> be surprise to programmers
>>>>>>>>>>> work(s);
>>>>>>>>>>> free(data1)
>>>>>>>>>>> free(data2)
>>>>>>>>>>>
>>>>>>>>>>> Clearly, IR semantics should
guarantee that escaped blocks are
>>>>>>>> disjoint.
>>>>>>>>>> It
>>>>>>>>>>> would be great for verification
tools on LLVM IR to be able
>>>>>>>>>>> to answer
>>>>>>>>>> that
>>>>>>>>>>> the second insert will succeed.
>>>>>>>>>> I agree, the second insert should
succeed, assuming `p1 && p2`.
>>>>>>>>>> I don't think my proposal would in
any way impact the program
>>>>>>>>>> above,
>>>>>>>>>> if anything the A3 reasoning makes sure
such a program with
>>>>>>>>>> allocas
>>>>>>>>>> is not miscompiled.
>>>>>>>>>>
>>>>>>>>>> I'm also not sure I understand what
you try to argue for. Maybe
>>>>>>>>>> elaborate a bit what it is you think is
bad or needs to be
>>>>>>>>>> changed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The problem is that definition of
escapedness is not clear
>>>>>>>>>>> at the
>>>>>>>>>> semantic
>>>>>>>>>>> level. Describing the IR semantics
w.r.t. LLVM's escaped
>>>>>>>>>>> analysis
>>>>>>>>>> function
>>>>>>>>>>> isn't something we want.
>>>>>>>>>>>
>>>>>>>>>>> Semantic definition of escapedness
of a pointer seems hard,
>>>>>>>>>>> I mean
>>>>>> in a
>>>>>>>>>>> stepwise manner.
>>>>>>>>>>> It isn't a single instruction
such as 'escape i8* ptr', and
>>>>>>>>>>> we need
>>>>>> to
>>>>>>>>>> look
>>>>>>>>>>> over all instructions in the
function. For example,
>>>>>>>>>>> '(int)(p+1) -
>>>>>>>> (int)p'
>>>>>>>>>>> isn't semantically escaping the
pointer p because the result
>>>>>>>>>>> is 1
>>>>>>>>>>> regardless of the value of p.
>>>>>>>>>>> Stepwisely defining the semantics
of instructions is a
>>>>>>>>>>> desirable
>>>>>>>>>> direction
>>>>>>>>>>> IMO.
>>>>>>>>>> I'm confused. What in the proposal
would prevent us from
>>>>>>>>>> defining
>>>>>>>>>> the semantics of instructions, or force
us to do it in an
>>>>>>>>>> "undesirable
>>>>>>>>>> way"?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> In practice, syntactically checking
escapedness + nice
>>>>>>>>>>> engineering
>>>>>>>> might
>>>>>>>>>>> not break optimizations in most
cases (as undef/poison did);
>>>>>>>>>>> but it
>>>>>>>> would
>>>>>>>>>>> be great to move to another level,
since LLVM IR is used in
>>>>>>>>>>> so many
>>>>>>>>>> places
>>>>>>>>>>> :)
>>>>>>>>>> The property under which you can
coalesce objects is simple:
>>>>>>>>>> It is not observable.
>>>>>>>>>>
>>>>>>>>>> Now, if the address of one of the two
objects you coalesce is
>>>>>>>>>> not
>>>>>>>>>> observed, coalescing is not observable.
That is a sufficient
>>>>>>>>>> condition
>>>>>>>>>> not a necessary one. Pointer
"escaping" is one step further.
>>>>>>>>>> If the
>>>>>>>>>> address doesn't escape it is not
observed. This does not mean
>>>>>>>>>> the
>>>>>>>>>> "semantic conditions for
coalescing", e.g., for the purpose of
>>>>>>>> translation
>>>>>>>>>> validation, is supposed to be build on
top of our "definition of
>>>>>>>> escaping
>>>>>>>>>> pointers". That said, we use
"does not escape" as a
>>>>>>>>>> precondition for
>>>>>>>>>> various transformation and I'm
unsure what is any different
>>>>>>>>>> now. The
>>>>>>>>>> entire escape argument is only used in
the validity of the
>>>>>>>>>> pointer
>>>>>>>> folding.
>>>>>>>>>> Similarly, we can fold a comparison of
a noalias pointer with
>>>>>>>>>> another
>>>>>>>> one
>>>>>>>>>> if the former does not escape (and both
are dereferenced and
>>>>>>>>>> one is
>>>>>>>>>> written for sure).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The pointer comparison is another
beast. It indeed has a few
>>>>>>>>>>> issues,
>>>>>>>> and
>>>>>>>>>>> solving it might require nontrivial
solution.
>>>>>>>>>> I think the main problem of the
inconsistencies and such
>>>>>>>>>> we've seen is
>>>>>>>>>> rooted in the erroneous folding of
pointer comparisons.
>>>>>>>>>> Cleaning up
>>>>>> the
>>>>>>>>>> lifetime marker semantics is actually
unrelated and simply
>>>>>>>>>> not folding
>>>>>>>>>> as described in A3 should solve the
issue that has been
>>>>>>>>>> reported.
>>>>>>>>>> Nevertheless,
>>>>>>>>>> we should take a crack at lifetime
markers while we are here.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Juneyoung
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2021 at 9:37 AM
Johannes Doerfert <
>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Juneyoung,
>>>>>>>>>>>>
>>>>>>>>>>>> Happy new year :)
>>>>>>>>>>>>
>>>>>>>>>>>> After we had a lengthy
discussion on phab last year, I've
>>>>>>>>>>>> tried to
>>>>>>>>>>>> summarize my thoughts,
>>>>>>>>>>>> especially given that I had
some time to think about things
>>>>>>>>>>>> over the
>>>>>>>>>> break.
>>>>>>>>>>>> Still, no promises on the
quality ;)
>>>>>>>>>>>>
>>>>>>>>>>>> I start with general questions
I asked myself and then go on
>>>>>> rambling
>>>>>>>>>>>> about a potential design.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Q1: Is lifetime a given
property or a derived one, thus is
>>>>>>>>>>>> it fixed
>>>>>> or
>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>
>>>>>>>>>>>> This is a question I asked
myself a lot recently. I think
>>>>>>>>>>>> it is
>>>>>>>> derived
>>>>>>>>>>>> and modifiable,
>>>>>>>>>>>> at least I hope it is. Only
that would allow
>>>>>>>>>>>> transformations I would
>>>>>>>>>> want
>>>>>>>>>>>> us to do. Here are some
examples:
>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Q2: Is a pointer comparison, or
similar use, extending the
>>>>>>>>>>>> lifetime?
>>>>>>>>>>>>
>>>>>>>>>>>> Asked differently, can we hoist
a pointer comparison into a
>>>>>>>>>>>> region
>>>>>>>> where
>>>>>>>>>>>> the pointer is dead?
>>>>>>>>>>>> This is an important question
which we haven't discussed
>>>>>>>>>>>> much as we
>>>>>>>>>>>> assumed LICM has to work.
>>>>>>>>>>>>
>>>>>>>>>>>> The current behavior is that
non-dereferencing uses are not
>>>>>> extending
>>>>>>>>>>>> the lifetime and are
>>>>>>>>>>>> allowed outside of
"lifetime regions" (as indicated by
>>>>>>>>>>>> markers).
>>>>>> They
>>>>>>>>>>>> will always produce valid
>>>>>>>>>>>> results. Though, we might want
to think about a lifetime
>>>>>>>>>>>> marker that
>>>>>>>>>>>> spits out a new pointer
>>>>>>>>>>>> value instead of reusing the
old one.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Q3: Can we use lifetime to
argue about addresses?
>>>>>>>>>>>>
>>>>>>>>>>>> The question here is, can we
fold address comparisons based on
>>>>>>>>>>>> lifetimes, or not.
>>>>>>>>>>>>
>>>>>>>>>>>> So far, we fold comparisons
based on "address information".
>>>>>>>>>>>> For
>>>>>>>> example,
>>>>>>>>>>>> we "know" globals,
>>>>>>>>>>>> allocas, and mallocs cannot be
equal to one another. Also, two
>>>>>>>> distinct
>>>>>>>>>>>> allocations, for globals
>>>>>>>>>>>> and allocas, are considered
unequal. Now, the crux is that
>>>>>>>>>>>> we have
>>>>>> to
>>>>>>>> be
>>>>>>>>>>>> consistent if we do two
>>>>>>>>>>>> comparisons, and, as of now, we
are not (bug number
>>>>>>>>>>>> missing). Since
>>>>>>>> the
>>>>>>>>>>>> backend (or any other place
>>>>>>>>>>>> for that matter) might coalesce
allocas, their addresses
>>>>>>>>>>>> might not
>>>>>> be
>>>>>>>>>>>> different after all. If we
>>>>>>>>>>>> already folded a comparison to
"unequal" we are doomed if
>>>>>>>>>>>> we later
>>>>>>>> have
>>>>>>>>>>>> a comparison that results
>>>>>>>>>>>> in "equal". (Note,
this is different from aliasing rules as
>>>>>>>>>>>> they can
>>>>>>>> be
>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Design:
>>>>>>>>>>>>
>>>>>>>>>>>> I would hope we can come up
with a model that treats memory
>>>>>>>>>>>> "the
>>>>>>>> same",
>>>>>>>>>>>> regardless if it is global,
>>>>>>>>>>>> stack, or heap. I want to avoid
special casing one of them
>>>>>>>>>>>> wrt.
>>>>>>>> lifetime
>>>>>>>>>>>> as I believe most optimizations
>>>>>>>>>>>> would apply to any of them,
potentially for different
>>>>>>>>>>>> reasons and
>>>>>> with
>>>>>>>>>>>> different gains, but
nevertheless.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Proposal (largely based on the
conversation in phab):
>>>>>>>>>>>>
>>>>>>>>>>>> A1: Lifetime is a concept that
talks about memory content
>>>>>>>>>>>> *only*.
>>>>>>>>>>>> Basically, the memory content
is set to
>>>>>>>>>>>> undefined by lifetime
markers. It is
>>>>>>>>>>>> derived/modifiable as
>>>>>> it
>>>>>>>> is
>>>>>>>>>>>> just an "as-is"
property of the memory
>>>>>>>>>>>> content. The lifetimes
of an object, as described by
>>>>>> markers,
>>>>>>>>>> might
>>>>>>>>>>>> change during the compilation.
They
>>>>>>>>>>>> might become smaller
if we deduce the object is
>>>>>>>>>>>> not accessed
>>>>>>>> and
>>>>>>>>>>>> the memory content is not used,
they
>>>>>>>>>>>> might become larger if
objects with non-overlapping
>>>>>> lifetimes
>>>>>>>> are
>>>>>>>>>>>> coalesced. (One could see the
latter as
>>>>>>>>>>>> introducing a new
object though.)
>>>>>>>>>>>>
>>>>>>>>>>>> A2: If we define lifetime as
above, it has nothing to do
>>>>>>>>>>>> with the
>>>>>>>>>>>> address of an object.
Consequently, pointer
>>>>>>>>>>>> comparisons and
similar operations are valid
>>>>>>>>>>>> outside the
>>>>>>>> lifetime.
>>>>>>>>>>>> Loads and stores are as well,
they can
>>>>>>>>>>>> even not be removed
"easily". A store followed by
>>>>>>>>>>>> a lifetime
>>>>>>>>>> marker
>>>>>>>>>>>> or a load following a lifetime
marker
>>>>>>>>>>>> is dead or results in
undef respectively.
>>>>>>>>>>>>
>>>>>>>>>>>> A3: We could not use lifetime
to argue about addresses.
>>>>>>>>>>>> This means
>>>>>> we
>>>>>>>>>>>> could/should also not argue
that overlapping
>>>>>>>>>>>> lifetimes result in
different addresses. Thus, a
>>>>>>>>>>>> comparison
>>>>>>>>>> between
>>>>>>>>>>>> the address of two allocas
could not
>>>>>>>>>>>> immediately be folded.
That said, there would be
>>>>>>>>>>>> special
>>>>>> cases
>>>>>>>>>>>> though. Basically, if one of
the allocas does
>>>>>>>>>>>> not escape, other than
the comparisons to be
>>>>>>>>>>>> folded, we can
>>>>>>>> fold
>>>>>>>>>>>> them. Afterwards, coalescing or
splitting
>>>>>>>>>>>> would still be
consistent because it is
>>>>>>>>>>>> unobservable. The
>>>>>>>> problem
>>>>>>>>>>>> we have in-tree is that we fold
even though
>>>>>>>>>>>> the address is still
observed (after the fold). It
>>>>>>>>>>>> is still
>>>>>>>>>> unclear
>>>>>>>>>>>> to me what the impact of this
would be
>>>>>>>>>>>> on real code. I
suggested before that we run some
>>>>>> experiments
>>>>>>>>>> first
>>>>>>>>>>>> before we make any decision
whatsoever.
>>>>>>>>>>>>
>>>>>>>>>>>> This is pretty much saying that
lifetime markers are
>>>>>> `memset(undef)`,
>>>>>>>> as
>>>>>>>>>>>> you suggested before (I think).
>>>>>>>>>>>> There are some implementation
level differences but at the
>>>>>>>>>>>> end of
>>>>>> the
>>>>>>>>>>>> day they are basically the
same.
>>>>>>>>>>>>
>>>>>>>>>>>> Happy to hear some thoughts on
this, especially if it fixes
>>>>>>>>>>>> the
>>>>>>>> problems
>>>>>>>>>>>> that lead to D93376 in the
first place.
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/18/20 2:42 AM, Juneyoung
Lee via llvm-dev wrote:
>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We're discussing the
well-formedness of lifetime.start/end
>>>>>> intrinsic
>>>>>>>>>>>> here (
>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>> (syntactically
>>>>>>>> &
>>>>>>>>>>>>> semantically) valid use of
these intrinsics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to gather more
context about the intrinsics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Is there a frontend or
framework that introduces
>>>>>>>>>>>>> lifetime call
>>>>>>>> with
>>>>>>>>>>>>> non-stack allocated
objects?
>>>>>>>>>>>>> If exists, which behavior
do they expect from it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Can a block's
lifetime be started bytewise or elementwise?
>>>>>>>>>>>>> I imagine an optimization
that allow using stack very
>>>>>>>>>>>>> compactly,
>>>>>> but
>>>>>>>>>>>> wonder
>>>>>>>>>>>>> there is a real-world use
case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> LLVM Developers mailing
list
>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>
>