thr3ads.net - llvm dev - [llvm-dev] lifetime.start/end [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2021-Jan-26 10:22 UTC

[llvm-dev] lifetime.start/end

Hi Johannes,
> On 1/24/21 10:07 AM, Ralf Jung wrote:
>> Hi Johannes,
>>
>>> At some point I lost track of what you even want to achieve.
>>
>> *lol* fair.^^
>> All I want is a precise and meaningful spec for lifetime.start and 
>> lifetime.end. That's where we entered this rabbit hole. You
didn't like
>> Juneyoung's and Nuno's proposal for this and instead suggested
to replace the
>> entire system by something else, and that new system is what we are
discussing
>> now (that is my understanding, anyway).
>>
> I went back to the proposal in question, https://reviews.llvm.org/D93376,
and I
> could not right away determine why this would avoid the inconsistency we
can
> imagine for something like
> 
> ```
> int *G = 0;
> void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
> void foo() {
>    intptr_t xp, yp;
>    { int x; xp = &x; bar(&x); }
>    { int y; yp = &y; bar(&y); }
>    print(xp == yp);               // One can imagine this is folded to
false and
> because x and y are coalesced bar will print true.
> }
> 
> ```
> 
> I was opposing the proposal because of the brittle syntactic restrictions.
In my
> attempt to justify not having those I also tried to resolve the above 
> inconsistency in a more general manner because I thought that was what it
was
> all about. I believe that we either have to argue the address of a dead
object
> is unusable/undetermined, basically the direction of James comment, or we
need
> to decouple lifetime markers from locations.
Folding to false and coalescing are inconsistent with each other, yes. It is 
just wrong to perform both of these on the above program, I would say (if we 
interpret it with LLVM semantics, which I assume has no "pointer
zapping"). Is
that up for debate?

However, I do not see the immediate connection to lifetime markers -- the same 
would be true for a program using malloc/free. Coalescing is permissible 
whenever lifetimes are disjoint. Folding to false is only permissible if the 
pointers are *definitely not equal*, e.g. if their lifetimes overlap.
>> Sure, everything in the semantics is up to what the program actually
observes.
>> But I was considering the general case, in which it will not be
possible to
>> prove that some address is not observed. I expect that to be quite
common
>> (certainly, Rust programs really like to take references to local
variables
>> and pass them to other functions), and in those cases, not having 
>> lifetime.start/end will severely restrict program transformations -- as
you
>> observed, even outlining is affected.
>>
>> This is basically the abstract version of the concrete concern that
James raised.
> 
> I get the argument that it can hinder optimizations but it is abstract. It
seems
> pretty simple to test this out, I mean to restrict the folding and the 
> coalescing. The benefit, which I think got ignored here a lot, is that it 
> provides a clear path towards deduction of coalescing opportunities, that
is,
> coalescing in the middle-end or backend w/o lifetime markers. Though,
similar to
> the impact of my proposal, I have no idea if that is even needed.
It is certainly not simple for me to test out anything that would involve 
changing LLVM. ;)
The syntactic condition also provides a clear path to evaluate whether any given
folding or coalescing is correct. Any sufficiently precise semantics will do 
that. The question is just which semantics y'all want. If the semantics
require
"pointer zapping" to remain sufficiently optimizable, that sounds like
a serious
drawback to me.

Kind regards,
Ralf
> 
> ~ Johannes
> 
> 
>>
>> Kind regards,
>> Ralf
>>
>>> A lot of my arguments around alloca/malloc/... is that we can
augment the
>>> information from the frontend by determining if an address is
observed or
>>> not. If it is not, we can limit its allocation, coalesce it, etc.
Lifetime
>>> markers can help us do this but they are not the only way.
>>> So, basically, we do not need lifetime markers to coalesce
alloca/malloc/...
>>> but that they can help.
>>
>>
>>
>>>
>>> Let's try to get out of this rabbit hole. If you have a problem
with the
>>> proposal as outlined in 
>>> https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
, please
>>> let me know. I assume dealing with observed addresses different
than with
>>> non-observede ones (implied by noescape) is the way forward now.
>>>
>>> ~ Johannes
>>>
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Ralf
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Kind regards,
>>>>>> Ralf
>>>>>>
>>>>>>>
>>>>>>> Does that make some sense?
>>>>>>>
>>>>>>> ~ Johannes
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Ralf
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee wrote:
>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM Johannes
Doerfert
>>>>>>>>>> <johannesdoerfert at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung Lee
wrote:
>>>>>>>>>>>> It is, alloca must have its
integral address decided at its
>>>>>>>>>>>> *allocation*
>>>>>>>>>>>> time as well, like 0xFF00... .
>>>>>>>>>>>>
>>>>>>>>>>>> For example, if 0xFFFF0000 ~
0xFF000000 is already allocated, the new
>>>>>>>>>>>> alloca cannot be placed there
unless it is not observed in the future,
>>>>>>>>>>>> according to your proposal.
>>>>>>>>>>>>
>>>>>>>>>>>> But how do we know from the
current state whether the stack variable is
>>>>>>>>>>>> going to be observed or not?
>>>>>>>>>>> With that argument you could
allocate all but 4 bytes, do a malloc(4)
>>>>>>>>>>> and know the address of the
returned pointer (assuming it is not null).
>>>>>>>>>>>
>>>>>>>>>>> What I try to say is, either your
scenario is part of the model and
>>>>>>>>>>> everything we do is broken as you
could "observe" addresses passively,
>>>>>>>>>>> *or*, the abstract machine we use
for semantic reasoning doesn't permit
>>>>>>>>>>> the above reasoning. I really hope
for the latter.
>>>>>>>>>>>
>>>>>>>>>> That's a very valid point..!
>>>>>>>>>>
>>>>>>>>>> Actually, I have a memory model that
addresses this problem.
>>>>>>>>>> The gist is that we can interpret each
allocation instruction as
>>>>>>>>>> *creating
>>>>>>>>>> 2 blocks* and nondeterministically
returning one of them.
>>>>>>>>>> The other one is marked as invalid but
not freed immediately.
>>>>>>>>>> Deallocation
>>>>>>>>>> frees both blocks.
>>>>>>>>>> If there is not enough space for having
two blocks, it is treated as
>>>>>>>>>> out-of-memory.
>>>>>>>>>>
>>>>>>>>>> It makes guessing the address of an
allocation according to memory layout
>>>>>>>>>> invalid UB.
>>>>>>>>>>
>>>>>>>>>> p = malloc(4)
>>>>>>>>>> // If p != null, it is guaranteed there
was at least two 4 byte slots
>>>>>>>>>> available, say 0x100~0x104 and
0x110~0x114.
>>>>>>>>>> // Two blocks are allocated at 0x110
and 0x114, and one of the
>>>>>>>>>> addresses is
>>>>>>>>>> nondeterministically returned.
>>>>>>>>>> // By the nature of nondeterminism, all
executions below should be
>>>>>>>>>> well-defined regardless of the address
of p.
>>>>>>>>>> *(int*)0x100 = 10; // In an execution
where p is 0x110, this raises UB.
>>>>>>>>>>
>>>>>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>>
>>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> To argue differently: Who is to say
there is a stack, or only one,
>>>>>>>>>>> or that alloca allocates memory
"on the one stack"? That is not part of
>>>>>>>>>>> the IR, IMHO. I can write a machine
on which alloca lowers to malloc,
>>>>>>>>>>> I don't even need the free
during stack unwind but that I could do as
>>>>>>>>>>> well if I wanted to.
>>>>>>>>>>
>>>>>>>>>> This is right, the example was for
illustrative purposes. IR does not
>>>>>>>>>> enforce alloca to be placed at
'stack'.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Making icmp/ptrtoint yield
poison will still make loop versioning or
>>>>>>>>>>>> pointer rewriting
transformations unsound because these operations now
>>>>>>>>>>> can
>>>>>>>>>>>> create poison (even if pointers
are noundef).
>>>>>>>>>>> I did not say they yield poison, at
least I did not try to say that.
>>>>>>>>>>> What are you referring to exactly?
>>>>>>>>>>>
>>>>>>>>>> I was referring to this:
>>>>>>>>>>
>>>>>>>>>>>> But this makes ptrtoint/icmp
make UB-raising instructions, which
>>>>>>>>>>>> contradicts with what LLVM
does.
>>>>>>>>>>> As with other violation of
attributes I would, on first though, suggest
>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>> But it is more about the (imaginary)
attribute, so maybe I was
>>>>>>>>>> slightly out
>>>>>>>>>> of topic.
>>>>>>>>>>
>>>>>>>>>>> 2) is fine, I think the suggestion
semantically makes sense
>>>>>>>>>>> perfectly. 1)
>>>>>>>>>>>> is something I'm concerned
about now.
>>>>>>>>>>>>
>>>>>>>>>>>> There are more than pointer
foldings, such as rewriting such
>>>>>>>>>>>> expression,
>>>>>>>>>>>> code motion ptr cmp, introduce
ptr cmp, etc. There is also analysis
>>>>>>>>>>> relying
>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>> Defining the correctness of
each of them is something we want to avoid,
>>>>>>>>>>> and
>>>>>>>>>>>> maybe that's why we want to
define precise semantics for things.
>>>>>>>>>>> I don't get the point. My
proposal does not change the semantics of
>>>>>>>>>>> pointer comparisons, at all. I
explicitly mentioned that in the last
>>>>>>>>>>> email.
>>>>>>>>>>>
>>>>>>>>>> Oh okay, I thought it was a part of the
lifetime proposal, but it seems
>>>>>>>>>> more like a separate thing.
>>>>>>>>>> I agree that this requires performance
impact.
>>>>>>>>>> Also investigation of existing
transformations would be needed; Alive2's
>>>>>>>>>> pointer comparison is doing
approximation yet, but if it becomes fully
>>>>>>>>>> precise, it will show something from
running LLVM unit tests I
>>>>>>>>>> believe..! :)
>>>>>>>>>>
>>>>>>>>>>> I think this will be less
aggressive and may give nice feedback to
>>>>>>>>>>>> potential projects that are
using lifetime with non-alloca.
>>>>>>>>>>> The lifetime marker debate,
basically 2) above, is orthogonal to the
>>>>>>>>>>> problem
>>>>>>>>>>> you try to solve. It got mixed in
as lifetime markers were used by
>>>>>>>>>>> StackColoring
>>>>>>>>>>> to perform coalescing but that is
coincidental. You can (arguably)
>>>>>>>>>>> coalesce stack
>>>>>>>>>>> allocations regardless of lifetime
markers and with 1) such a
>>>>>>>>>>> transformation
>>>>>>>>>>> (w/ and w/o lifetime markers) would
actually be sound.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> At the end, we can implement IR
writer that lowers lifetime with
>>>>>>>>>>> non-alloca
>>>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>>>> Yeah, 2) is orthogonal and we can
lower it that way. Unsure if it is
>>>>>>>>>>> helpful
>>>>>>>>>>> but we can certainly define it that
way in the LangRef.
>>>>>>>>>>>
>>>>>>>>>> Okay, thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>
>>>>>>>>>>>> p.s. The reply was late, sorry.
I think I can spend more time on this
>>>>>>>>>>> today.
>>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02 AM
Johannes Doerfert <
>>>>>>>>>>> johannesdoerfert at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/6/21 4:33 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>>> Stepwisely
defining the semantics of instructions is a desirable
>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>> I'm confused.
What in the proposal would prevent us from defining
>>>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an
>>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I meant it would be
great if the output state after executing an
>>>>>>>>>>>>>> instruction can be
described using its input state.
>>>>>>>>>>>>>> (that was the meaning
of 'stepwise semantics', I should have been
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For example, the
semantics of 'z = add x y' can be defined as
>>>>>>>>>>>>>> follows:
>>>>>>>>>>>>>> Given an input state s,
next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>>> where s(x) is the value
of x in the previous state, and s[z -> v]
>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>> state with z updated to
v.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another example that
involves memory: the semantics of 'i =
>>>>>>>>>>>>>> ptrtoint p'
>>>>>>>>>>>>> can
>>>>>>>>>>>>>> be defined as follows:
>>>>>>>>>>>>>> Given an input state s,
next state s' = s[i -> s(p).obj.address +
>>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>>> where obj.address is
the begin address of a memory object obj pointed
>>>>>>>>>>> by
>>>>>>>>>>>>> p
>>>>>>>>>>>>>> & offset is p's
byte offset. (Imagine a pointer to the offset of some
>>>>>>>>>>>>> char
>>>>>>>>>>>>>> array).
>>>>>>>>>>>>>> Note that ptrtoint
& add can be nicely defined w.r.t the input state.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, the instruction
that we're facing is 'p = alloca'.
>>>>>>>>>>>>>> To describe the output
state after executing 'p = alloca', the
>>>>>>>>>>>>>> address
>>>>>>>>>>> of
>>>>>>>>>>>>>> new alloca should be
determined.
>>>>>>>>>>>>>> If observedness is
involved, we need to know the future state
>>>>>>>>>>>>>> again. :/
>>>>>>>>>>>>> We
>>>>>>>>>>>>>> don't know whether
the alloca is going to be observed or not without
>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>> This is the problem of
the current lifetime intrinsics as well.
>>>>>>>>>>>>> No, you mix things up here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nobody proposed to modify
the semantics of `alloca`.
>>>>>>>>>>>>> `alloca` provides you with
a fresh, unobserved block of
>>>>>>>>>>>>> dereferenceable memory that
is implicitly freed as the stack
>>>>>>>>>>>>> unwinds. That is it. No
context necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you want to modify the
IR, you need to argue the observable
>>>>>>>>>>>>> semantics which is nothing
new. That this might require more than
>>>>>>>>>>>>> a peephole view of the
program is also not new.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> One possible approach
to resolve this is adding an 'unobserved'
>>>>>>>>>>>>>> flag to
>>>>>>>>>>>>>> alloca instruction
(similar to what was suggested by Nicolai).
>>>>>>>>>>>>>> And, we can say that if
alloca with 'unobserved' is used by
>>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>>> The flag can be added, like
we add other attributes. It should not
>>>>>>>>>>>>> be required for any
optimization we talked about though. It basically
>>>>>>>>>>>>> is a way to manifest
derived or given information into the IR.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Attribute deduction, as
well as frontends with domain knowledge,
>>>>>>>>>>>>> can add such information.
The flag we discussed in phab was not even
>>>>>>>>>>>>> sufficient for all the
transformation examples I presented in my mail,
>>>>>>>>>>>>> that is why I extended my 
argument. We could still have a "noescape"
>>>>>>>>>>>>> flag for allocas, but
I'm not sure how useful that really is. We can
>>>>>>>>>>>>> certainly deduce it and
manifest it, unsure if we have domain
>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>> we can use for non-trivial
cases though.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions, which
>>>>>>>>>>>>>> contradicts with what
LLVM does.
>>>>>>>>>>>>> As with other violation of
attributes I would, on first though,
>>>>>>>>>>>>> suggest
>>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, existing
optimizations like loop versioning can introduce
>>>>>>>>>>>>>> ptrtoint/pointer
comparisons too.
>>>>>>>>>>>>> Sure. I am not certain why
that is a problem. I get the feeling things
>>>>>>>>>>>>> are still mixed up here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What I proposed is twofold:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) We stop folding
comparisons between different allocas if
>>>>>>>>>>>>> changing the
>>>>>>>>>>>>> address
>>>>>>>>>>>>>       of both might be
observable. Thus, if both might have their
>>>>>>>>>>>>> address
>>>>>>>>>>>>> "taken"/escaped,
>>>>>>>>>>>>>       other than the
comparisons we want to fold, we cannot proceed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) We define lifetime
markers to mean `memset(undef)`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The first should be
sufficient for the problem you were trying to
>>>>>>>>>>>>> solve
>>>>>>>>>>>>> in the
>>>>>>>>>>>>> first place. The second
makes lifetime markers less weird. Note
>>>>>>>>>>>>> that 1)
>>>>>>>>>>>>> is not changing
>>>>>>>>>>>>> the semantics of the IR. We
basically just argue there is a bug in our
>>>>>>>>>>>>> instcombine right
>>>>>>>>>>>>> now as we do not check all
necessary preconditions.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see that there are
other questions that I didn't answer yet, but
>>>>>>>>>>>>>> let
>>>>>>>>>>> me
>>>>>>>>>>>>>> answer this first to
limit the length of the text :)
>>>>>>>>>>>>> Sure, we can split the
discussion :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jan 7, 2021 at
3:36 AM Johannes Doerfert <
>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/5/21 8:00 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>>> Hi Johannes,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I read your
proposal and thought about the model.
>>>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As you
concerned in A3, certain programs may be valid only when
>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> blocks with
overlapping lifetimes have disjoint addresses.
>>>>>>>>>>>>>>>> Look at this
example (I'm using malloc, but alloca also works):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> p1 = malloc(4)
>>>>>>>>>>>>>>>> p2 = malloc(4)
// for brevity, assume that there as enough space
>>>>>>>>>>>>>>>> & p1
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>>>>
set<char*> s;
>>>>>>>>>>>>>>>> s.insert(p1);
s.insert(p2); // If the second insert did nothing, it
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> be surprise to
programmers
>>>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Clearly, IR
semantics should guarantee that escaped blocks are
>>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>> would be great
for verification tools on LLVM IR to be able to
>>>>>>>>>>>>>>>> answer
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> the second
insert will succeed.
>>>>>>>>>>>>>>> I agree, the second
insert should succeed, assuming `p1 && p2`.
>>>>>>>>>>>>>>> I don't think
my proposal would in any way impact the program above,
>>>>>>>>>>>>>>> if anything the A3
reasoning makes sure such a program with allocas
>>>>>>>>>>>>>>> is not miscompiled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm also not
sure I understand what you try to argue for. Maybe
>>>>>>>>>>>>>>> elaborate a bit
what it is you think is bad or needs to be changed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The problem is
that definition of escapedness is not clear at the
>>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>>> level.
Describing the IR semantics w.r.t. LLVM's escaped analysis
>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>> isn't
something we want.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Semantic
definition of escapedness of a pointer seems hard, I mean
>>>>>>>>>>> in a
>>>>>>>>>>>>>>>> stepwise
manner.
>>>>>>>>>>>>>>>> It isn't a
single instruction such as 'escape i8* ptr', and we need
>>>>>>>>>>> to
>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>> over all
instructions in the function. For example, '(int)(p+1) -
>>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>>> isn't
semantically escaping the pointer p because the result is 1
>>>>>>>>>>>>>>>> regardless of
the value of p.
>>>>>>>>>>>>>>>> Stepwisely
defining the semantics of instructions is a desirable
>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>> I'm confused.
What in the proposal would prevent us from defining
>>>>>>>>>>>>>>> the semantics of
instructions, or force us to do it in an
>>>>>>>>>>>>>>> "undesirable
>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In practice,
syntactically checking escapedness + nice engineering
>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>> not break
optimizations in most cases (as undef/poison did); but it
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> be great to
move to another level, since LLVM IR is used in so many
>>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>> The property under
which you can coalesce objects is simple:
>>>>>>>>>>>>>>>       It is not
observable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now, if the address
of one of the two objects you coalesce is not
>>>>>>>>>>>>>>> observed,
coalescing is not observable. That is a sufficient
>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>> not a necessary
one. Pointer "escaping" is one step further. If the
>>>>>>>>>>>>>>> address doesn't
escape it is not observed. This does not mean the
>>>>>>>>>>>>>>> "semantic
conditions for coalescing", e.g., for the purpose of
>>>>>>>>>>>>> translation
>>>>>>>>>>>>>>> validation, is
supposed to be build on top of our "definition of
>>>>>>>>>>>>> escaping
>>>>>>>>>>>>>>> pointers".
That said, we use "does not escape" as a precondition for
>>>>>>>>>>>>>>> various
transformation and I'm unsure what is any different now. The
>>>>>>>>>>>>>>> entire escape
argument is only used in the validity of the pointer
>>>>>>>>>>>>> folding.
>>>>>>>>>>>>>>> Similarly, we can
fold a comparison of a noalias pointer with
>>>>>>>>>>>>>>> another
>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> if the former does
not escape (and both are dereferenced and one is
>>>>>>>>>>>>>>> written for sure).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The pointer
comparison is another beast. It indeed has a few
>>>>>>>>>>>>>>>> issues,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> solving it
might require nontrivial solution.
>>>>>>>>>>>>>>> I think the main
problem of the inconsistencies and such we've
>>>>>>>>>>>>>>> seen is
>>>>>>>>>>>>>>> rooted in the
erroneous folding of pointer comparisons. Cleaning up
>>>>>>>>>>> the
>>>>>>>>>>>>>>> lifetime marker
semantics is actually unrelated and simply not
>>>>>>>>>>>>>>> folding
>>>>>>>>>>>>>>> as described in A3
should solve the issue that has been reported.
>>>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>>>> we should take a
crack at lifetime markers while we are here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jan 5,
2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi
Juneyoung,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Happy new
year :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> After we
had a lengthy discussion on phab last year, I've tried to
>>>>>>>>>>>>>>>>> summarize
my thoughts,
>>>>>>>>>>>>>>>>> especially
given that I had some time to think about things
>>>>>>>>>>>>>>>>> over the
>>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>>> Still, no
promises on the quality ;)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I start
with general questions I asked myself and then go on
>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>> about a
potential design.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Q1: Is
lifetime a given property or a derived one, thus is it
>>>>>>>>>>>>>>>>> fixed
>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> modifiable?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a
question I asked myself a lot recently. I think it is
>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>> and
modifiable,
>>>>>>>>>>>>>>>>> at least I
hope it is. Only that would allow transformations I
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> us to do.
Here are some examples:
>>>>>>>>>>>>>>>>>       
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>       
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Q2: Is a
pointer comparison, or similar use, extending the
>>>>>>>>>>>>>>>>> lifetime?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Asked
differently, can we hoist a pointer comparison into a region
>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>> the pointer
is dead?
>>>>>>>>>>>>>>>>> This is an
important question which we haven't discussed much
>>>>>>>>>>>>>>>>> as we
>>>>>>>>>>>>>>>>> assumed
LICM has to work.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The current
behavior is that non-dereferencing uses are not
>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>> the
lifetime and are
>>>>>>>>>>>>>>>>> allowed
outside of "lifetime regions" (as indicated by markers).
>>>>>>>>>>> They
>>>>>>>>>>>>>>>>> will always
produce valid
>>>>>>>>>>>>>>>>> results.
Though, we might want to think about a lifetime marker
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> spits out a
new pointer
>>>>>>>>>>>>>>>>> value
instead of reusing the old one.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Q3: Can we
use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The
question here is, can we fold address comparisons based on
>>>>>>>>>>>>>>>>> lifetimes,
or not.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So far, we
fold comparisons based on "address information". For
>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>> we
"know" globals,
>>>>>>>>>>>>>>>>> allocas,
and mallocs cannot be equal to one another. Also, two
>>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>>>
allocations, for globals
>>>>>>>>>>>>>>>>> and
allocas, are considered unequal. Now, the crux is that we have
>>>>>>>>>>> to
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> consistent
if we do two
>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number missing).
>>>>>>>>>>>>>>>>> Since
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> backend (or
any other place
>>>>>>>>>>>>>>>>> for that
matter) might coalesce allocas, their addresses might not
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> different
after all. If we
>>>>>>>>>>>>>>>>> already
folded a comparison to "unequal" we are doomed if we later
>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> a
comparison that results
>>>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing rules as
>>>>>>>>>>>>>>>>> they can
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> stricter.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would
hope we can come up with a model that treats memory "the
>>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>>> regardless
if it is global,
>>>>>>>>>>>>>>>>> stack, or
heap. I want to avoid special casing one of them wrt.
>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>> as I
believe most optimizations
>>>>>>>>>>>>>>>>> would apply
to any of them, potentially for different reasons and
>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> different
gains, but nevertheless.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Proposal
(largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A1:
Lifetime is a concept that talks about memory content *only*.
>>>>>>>>>>>>>>>>> Basically,
the memory content is set to
>>>>>>>>>>>>>>>>>         
undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>>> it
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> just an
"as-is" property of the memory
>>>>>>>>>>>>>>>>>         
content. The lifetimes of an object, as described by
>>>>>>>>>>> markers,
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>> change
during the compilation. They
>>>>>>>>>>>>>>>>>         
might become smaller if we deduce the object is not
>>>>>>>>>>>>>>>>> accessed
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> the memory
content is not used, they
>>>>>>>>>>>>>>>>>         
might become larger if objects with non-overlapping
>>>>>>>>>>> lifetimes
>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> coalesced.
(One could see the latter as
>>>>>>>>>>>>>>>>>         
introducing a new object though.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A2: If we
define lifetime as above, it has nothing to do with the
>>>>>>>>>>>>>>>>> address of
an object. Consequently, pointer
>>>>>>>>>>>>>>>>>         
comparisons and similar operations are valid outside the
>>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>>> Loads and
stores are as well, they can
>>>>>>>>>>>>>>>>>         
even not be removed "easily". A store followed by a
>>>>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>>> or a load
following a lifetime marker
>>>>>>>>>>>>>>>>>          is
dead or results in undef respectively.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A3: We
could not use lifetime to argue about addresses. This means
>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>         
lifetimes result in different addresses. Thus, a
>>>>>>>>>>>>>>>>> comparison
>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>> the address
of two allocas could not
>>>>>>>>>>>>>>>>>         
immediately be folded. That said, there would be special
>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>> though.
Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>         
not escape, other than the comparisons to be folded,
>>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>> them.
Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>         
would still be consistent because it is unobservable. The
>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>> we have
in-tree is that we fold even though
>>>>>>>>>>>>>>>>>         
the address is still observed (after the fold). It is
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>>> to me what
the impact of this would be
>>>>>>>>>>>>>>>>>          on
real code. I suggested before that we run some
>>>>>>>>>>> experiments
>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>> before we
make any decision whatsoever.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is
pretty much saying that lifetime markers are
>>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>> you
suggested before (I think).
>>>>>>>>>>>>>>>>> There are
some implementation level differences but at the end of
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> day they
are basically the same.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Happy to
hear some thoughts on this, especially if it fixes the
>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>> that lead
to D93376 in the first place.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 12/18/20
2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>> Hello
all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of lifetime.start/end
>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>> (syntactically
>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd
like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. Is
there a frontend or framework that introduces lifetime call
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>>>>>>>>>>>>>>>>>> If
exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. Can
a block's lifetime be started bytewise or elementwise?
>>>>>>>>>>>>>>>>>> I
imagine an optimization that allow using stack very compactly,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>>>> there
is a real-world use case.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>> LLVM
Developers mailing list
>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
-- 
Website: https://people.mpi-sws.org/~jung/

Juneyoung Lee via llvm-dev

2021-Jan-26 14:42 UTC

head link

[llvm-dev] lifetime.start/end

I want to avoid allowing 'pointer zapping' too; LLVM IR has a static
single
assignment form, and I think it might not be a good idea to add a special
rule that violates this.
Well, there might be some semantics that would explain optimizations that
we can imagine, but it introduces more complexity to the language
semantics by adding a special case.

BTW, is there any reason why allocas are being placed at the entry block?
If alloca was simply placed at the location where lifetime.start is
currently being placed, we don't need to struggle with this complication
about the semantics of lifetime.start..?

Thanks,
Juneyoung

On Tue, Jan 26, 2021 at 7:22 PM Ralf Jung <jung at mpi-sws.org> wrote:
> Hi Johannes,
>
> > On 1/24/21 10:07 AM, Ralf Jung wrote:
> >> Hi Johannes,
> >>
> >>> At some point I lost track of what you even want to achieve.
> >>
> >> *lol* fair.^^
> >> All I want is a precise and meaningful spec for lifetime.start and
> >> lifetime.end. That's where we entered this rabbit hole. You
didn't like
> >> Juneyoung's and Nuno's proposal for this and instead
suggested to
> replace the
> >> entire system by something else, and that new system is what we
are
> discussing
> >> now (that is my understanding, anyway).
> >>
> > I went back to the proposal in question,
https://reviews.llvm.org/D93376,
> and I
> > could not right away determine why this would avoid the inconsistency
we
> can
> > imagine for something like
> >
> > ```
> > int *G = 0;
> > void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
> > void foo() {
> >    intptr_t xp, yp;
> >    { int x; xp = &x; bar(&x); }
> >    { int y; yp = &y; bar(&y); }
> >    print(xp == yp);               // One can imagine this is folded to
> false and
> > because x and y are coalesced bar will print true.
> > }
> >
> > ```
> >
> > I was opposing the proposal because of the brittle syntactic
> restrictions. In my
> > attempt to justify not having those I also tried to resolve the above
> > inconsistency in a more general manner because I thought that was what
> it was
> > all about. I believe that we either have to argue the address of a
dead
> object
> > is unusable/undetermined, basically the direction of James comment, or
> we need
> > to decouple lifetime markers from locations.
>
> Folding to false and coalescing are inconsistent with each other, yes. It
> is
> just wrong to perform both of these on the above program, I would say (if
> we
> interpret it with LLVM semantics, which I assume has no "pointer
> zapping"). Is
> that up for debate?
>
> However, I do not see the immediate connection to lifetime markers -- the
> same
> would be true for a program using malloc/free. Coalescing is permissible
> whenever lifetimes are disjoint. Folding to false is only permissible if
> the
> pointers are *definitely not equal*, e.g. if their lifetimes overlap.
>
> >> Sure, everything in the semantics is up to what the program
actually
> observes.
> >> But I was considering the general case, in which it will not be
> possible to
> >> prove that some address is not observed. I expect that to be quite
> common
> >> (certainly, Rust programs really like to take references to local
> variables
> >> and pass them to other functions), and in those cases, not having
> >> lifetime.start/end will severely restrict program transformations
-- as
> you
> >> observed, even outlining is affected.
> >>
> >> This is basically the abstract version of the concrete concern
that
> James raised.
> >
> > I get the argument that it can hinder optimizations but it is
abstract.
> It seems
> > pretty simple to test this out, I mean to restrict the folding and the
> > coalescing. The benefit, which I think got ignored here a lot, is that
> it
> > provides a clear path towards deduction of coalescing opportunities,
> that is,
> > coalescing in the middle-end or backend w/o lifetime markers. Though,
> similar to
> > the impact of my proposal, I have no idea if that is even needed.
>
> It is certainly not simple for me to test out anything that would involve
> changing LLVM. ;)
> The syntactic condition also provides a clear path to evaluate whether any
> given
> folding or coalescing is correct. Any sufficiently precise semantics will
> do
> that. The question is just which semantics y'all want. If the semantics
> require
> "pointer zapping" to remain sufficiently optimizable, that sounds
like a
> serious
> drawback to me.
>
> Kind regards,
> Ralf
>
> >
> > ~ Johannes
> >
> >
> >>
> >> Kind regards,
> >> Ralf
> >>
> >>> A lot of my arguments around alloca/malloc/... is that we can
augment
> the
> >>> information from the frontend by determining if an address is
observed
> or
> >>> not. If it is not, we can limit its allocation, coalesce it,
etc.
> Lifetime
> >>> markers can help us do this but they are not the only way.
> >>> So, basically, we do not need lifetime markers to coalesce
> alloca/malloc/...
> >>> but that they can help.
> >>
> >>
> >>
> >>>
> >>> Let's try to get out of this rabbit hole. If you have a
problem with
> the
> >>> proposal as outlined in
> >>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html ,
> please
> >>> let me know. I assume dealing with observed addresses
different than
> with
> >>> non-observede ones (implied by noescape) is the way forward
now.
> >>>
> >>> ~ Johannes
> >>>
> >>>
> >>>
> >>>>
> >>>> Kind regards,
> >>>> Ralf
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Kind regards,
> >>>>>> Ralf
> >>>>>>
> >>>>>>>
> >>>>>>> Does that make some sense?
> >>>>>>>
> >>>>>>> ~ Johannes
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Kind regards,
> >>>>>>>> Ralf
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee
wrote:
> >>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM
Johannes Doerfert
> >>>>>>>>>> <johannesdoerfert at
gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung
Lee wrote:
> >>>>>>>>>>>> It is, alloca must have
its integral address decided at its
> >>>>>>>>>>>> *allocation*
> >>>>>>>>>>>> time as well, like
0xFF00... .
> >>>>>>>>>>>>
> >>>>>>>>>>>> For example, if 0xFFFF0000
~ 0xFF000000 is already allocated,
> the new
> >>>>>>>>>>>> alloca cannot be placed
there unless it is not observed in
> the future,
> >>>>>>>>>>>> according to your
proposal.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But how do we know from
the current state whether the stack
> variable is
> >>>>>>>>>>>> going to be observed or
not?
> >>>>>>>>>>> With that argument you could
allocate all but 4 bytes, do a
> malloc(4)
> >>>>>>>>>>> and know the address of the
returned pointer (assuming it is
> not null).
> >>>>>>>>>>>
> >>>>>>>>>>> What I try to say is, either
your scenario is part of the
> model and
> >>>>>>>>>>> everything we do is broken as
you could "observe" addresses
> passively,
> >>>>>>>>>>> *or*, the abstract machine we
use for semantic reasoning
> doesn't permit
> >>>>>>>>>>> the above reasoning. I really
hope for the latter.
> >>>>>>>>>>>
> >>>>>>>>>> That's a very valid point..!
> >>>>>>>>>>
> >>>>>>>>>> Actually, I have a memory model
that addresses this problem.
> >>>>>>>>>> The gist is that we can interpret
each allocation instruction
> as
> >>>>>>>>>> *creating
> >>>>>>>>>> 2 blocks* and nondeterministically
returning one of them.
> >>>>>>>>>> The other one is marked as invalid
but not freed immediately.
> >>>>>>>>>> Deallocation
> >>>>>>>>>> frees both blocks.
> >>>>>>>>>> If there is not enough space for
having two blocks, it is
> treated as
> >>>>>>>>>> out-of-memory.
> >>>>>>>>>>
> >>>>>>>>>> It makes guessing the address of
an allocation according to
> memory layout
> >>>>>>>>>> invalid UB.
> >>>>>>>>>>
> >>>>>>>>>> p = malloc(4)
> >>>>>>>>>> // If p != null, it is guaranteed
there was at least two 4 byte
> slots
> >>>>>>>>>> available, say 0x100~0x104 and
0x110~0x114.
> >>>>>>>>>> // Two blocks are allocated at
0x110 and 0x114, and one of the
> >>>>>>>>>> addresses is
> >>>>>>>>>> nondeterministically returned.
> >>>>>>>>>> // By the nature of
nondeterminism, all executions below should
> be
> >>>>>>>>>> well-defined regardless of the
address of p.
> >>>>>>>>>> *(int*)0x100 = 10; // In an
execution where p is 0x110, this
> raises UB.
> >>>>>>>>>>
> >>>>>>>>>> The paper link is here
<https://dl.acm.org/doi/10.1145/3276495>
> FYI.
> >>>>>>>>>
> >>>>>>>>> Nice! Thanks for the link :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> To argue differently: Who is to
say there is a stack, or only
> one,
> >>>>>>>>>>> or that alloca allocates
memory "on the one stack"? That is
> not part of
> >>>>>>>>>>> the IR, IMHO. I can write a
machine on which alloca lowers to
> malloc,
> >>>>>>>>>>> I don't even need the free
during stack unwind but that I
> could do as
> >>>>>>>>>>> well if I wanted to.
> >>>>>>>>>>
> >>>>>>>>>> This is right, the example was for
illustrative purposes. IR
> does not
> >>>>>>>>>> enforce alloca to be placed at
'stack'.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> Making icmp/ptrtoint yield
poison will still make loop
> versioning or
> >>>>>>>>>>>> pointer rewriting
transformations unsound because these
> operations now
> >>>>>>>>>>> can
> >>>>>>>>>>>> create poison (even if
pointers are noundef).
> >>>>>>>>>>> I did not say they yield
poison, at least I did not try to say
> that.
> >>>>>>>>>>> What are you referring to
exactly?
> >>>>>>>>>>>
> >>>>>>>>>> I was referring to this:
> >>>>>>>>>>
> >>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions,
> which
> >>>>>>>>>>>> contradicts with what LLVM
does.
> >>>>>>>>>>> As with other violation of
attributes I would, on first
> though, suggest
> >>>>>>>>>>> to produce poison, not UB.
> >>>>>>>>>> But it is more about the
(imaginary) attribute, so maybe I was
> >>>>>>>>>> slightly out
> >>>>>>>>>> of topic.
> >>>>>>>>>>
> >>>>>>>>>>> 2) is fine, I think the
suggestion semantically makes sense
> >>>>>>>>>>> perfectly. 1)
> >>>>>>>>>>>> is something I'm
concerned about now.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There are more than
pointer foldings, such as rewriting such
> >>>>>>>>>>>> expression,
> >>>>>>>>>>>> code motion ptr cmp,
introduce ptr cmp, etc. There is also
> analysis
> >>>>>>>>>>> relying
> >>>>>>>>>>>> on ptr cmp.
> >>>>>>>>>>>> Defining the correctness
of each of them is something we want
> to avoid,
> >>>>>>>>>>> and
> >>>>>>>>>>>> maybe that's why we
want to define precise semantics for
> things.
> >>>>>>>>>>> I don't get the point. My
proposal does not change the
> semantics of
> >>>>>>>>>>> pointer comparisons, at all. I
explicitly mentioned that in
> the last
> >>>>>>>>>>> email.
> >>>>>>>>>>>
> >>>>>>>>>> Oh okay, I thought it was a part
of the lifetime proposal, but
> it seems
> >>>>>>>>>> more like a separate thing.
> >>>>>>>>>> I agree that this requires
performance impact.
> >>>>>>>>>> Also investigation of existing
transformations would be needed;
> Alive2's
> >>>>>>>>>> pointer comparison is doing
approximation yet, but if it
> becomes fully
> >>>>>>>>>> precise, it will show something
from running LLVM unit tests I
> >>>>>>>>>> believe..! :)
> >>>>>>>>>>
> >>>>>>>>>>> I think this will be less
aggressive and may give nice
> feedback to
> >>>>>>>>>>>> potential projects that
are using lifetime with non-alloca.
> >>>>>>>>>>> The lifetime marker debate,
basically 2) above, is orthogonal
> to the
> >>>>>>>>>>> problem
> >>>>>>>>>>> you try to solve. It got mixed
in as lifetime markers were
> used by
> >>>>>>>>>>> StackColoring
> >>>>>>>>>>> to perform coalescing but that
is coincidental. You can
> (arguably)
> >>>>>>>>>>> coalesce stack
> >>>>>>>>>>> allocations regardless of
lifetime markers and with 1) such a
> >>>>>>>>>>> transformation
> >>>>>>>>>>> (w/ and w/o lifetime markers)
would actually be sound.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> At the end, we can
implement IR writer that lowers lifetime
> with
> >>>>>>>>>>> non-alloca
> >>>>>>>>>>>> into memset(undef). WDYT?
> >>>>>>>>>>> Yeah, 2) is orthogonal and we
can lower it that way. Unsure if
> it is
> >>>>>>>>>>> helpful
> >>>>>>>>>>> but we can certainly define it
that way in the LangRef.
> >>>>>>>>>>>
> >>>>>>>>>> Okay, thanks!
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> ~ Johannes
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Juneyoung
> >>>>>>>>>>>>
> >>>>>>>>>>>> p.s. The reply was late,
sorry. I think I can spend more time
> on this
> >>>>>>>>>>> today.
> >>>>>>>>>>>> On Thu, Jan 7, 2021 at
9:02 AM Johannes Doerfert <
> >>>>>>>>>>> johannesdoerfert at
gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 1/6/21 4:33 PM,
Juneyoung Lee wrote:
> >>>>>>>>>>>>>>>> Stepwisely
defining the semantics of instructions is a
> desirable
> >>>>>>>>>>>>>> direction
> >>>>>>>>>>>>>>>> IMO.
> >>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us from
> defining
> >>>>>>>>>>>>>>> the semantics
of instructions, or force us to do it in an
> >>>>>>>>>>>>>>>
"undesirable
> >>>>>>>>>>>>>> way"?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I meant it would
be great if the output state after
> executing an
> >>>>>>>>>>>>>> instruction can be
described using its input state.
> >>>>>>>>>>>>>> (that was the
meaning of 'stepwise semantics', I should
> have been
> >>>>>>>>>>>>>> more
> >>>>>>>>>>>>>> clear about this)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For example, the
semantics of 'z = add x y' can be defined
> as
> >>>>>>>>>>>>>> follows:
> >>>>>>>>>>>>>> Given an input
state s, next state s' = s[z -> s(x) + s(y)]
> >>>>>>>>>>>>>> where s(x) is the
value of x in the previous state, and s[z
> -> v]
> >>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>> state with z
updated to v.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another example
that involves memory: the semantics of 'i >
>>>>>>>>>>>>>> ptrtoint p'
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>>> be defined as
follows:
> >>>>>>>>>>>>>> Given an input
state s, next state s' = s[i ->
> s(p).obj.address +
> >>>>>>>>>>>>>> s(p).offset]
> >>>>>>>>>>>>>> where obj.address
is the begin address of a memory object
> obj pointed
> >>>>>>>>>>> by
> >>>>>>>>>>>>> p
> >>>>>>>>>>>>>> & offset is
p's byte offset. (Imagine a pointer to the
> offset of some
> >>>>>>>>>>>>> char
> >>>>>>>>>>>>>> array).
> >>>>>>>>>>>>>> Note that ptrtoint
& add can be nicely defined w.r.t the
> input state.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Now, the
instruction that we're facing is 'p = alloca'.
> >>>>>>>>>>>>>> To describe the
output state after executing 'p = alloca',
> the
> >>>>>>>>>>>>>> address
> >>>>>>>>>>> of
> >>>>>>>>>>>>>> new alloca should
be determined.
> >>>>>>>>>>>>>> If observedness is
involved, we need to know the future
> state
> >>>>>>>>>>>>>> again. :/
> >>>>>>>>>>>>> We
> >>>>>>>>>>>>>> don't know
whether the alloca is going to be observed or
> not without
> >>>>>>>>>>>>> seeing
> >>>>>>>>>>>>>> the future.
> >>>>>>>>>>>>>> This is the
problem of the current lifetime intrinsics as
> well.
> >>>>>>>>>>>>> No, you mix things up
here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Nobody proposed to
modify the semantics of `alloca`.
> >>>>>>>>>>>>> `alloca` provides you
with a fresh, unobserved block of
> >>>>>>>>>>>>> dereferenceable memory
that is implicitly freed as the stack
> >>>>>>>>>>>>> unwinds. That is it.
No context necessary.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If you want to modify
the IR, you need to argue the
> observable
> >>>>>>>>>>>>> semantics which is
nothing new. That this might require more
> than
> >>>>>>>>>>>>> a peephole view of the
program is also not new.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> One possible
approach to resolve this is adding an
> 'unobserved'
> >>>>>>>>>>>>>> flag to
> >>>>>>>>>>>>>> alloca instruction
(similar to what was suggested by
> Nicolai).
> >>>>>>>>>>>>>> And, we can say
that if alloca with 'unobserved' is used by
> >>>>>>>>>>>>> ptrtoint/icmp,
> >>>>>>>>>>>>>> it is UB.
> >>>>>>>>>>>>> The flag can be added,
like we add other attributes. It
> should not
> >>>>>>>>>>>>> be required for any
optimization we talked about though. It
> basically
> >>>>>>>>>>>>> is a way to manifest
derived or given information into the
> IR.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Attribute deduction,
as well as frontends with domain
> knowledge,
> >>>>>>>>>>>>> can add such
information. The flag we discussed in phab was
> not even
> >>>>>>>>>>>>> sufficient for all the
transformation examples I presented
> in my mail,
> >>>>>>>>>>>>> that is why I extended
my  argument. We could still have a
> "noescape"
> >>>>>>>>>>>>> flag for allocas, but
I'm not sure how useful that really
> is. We can
> >>>>>>>>>>>>> certainly deduce it
and manifest it, unsure if we have
> domain
> >>>>>>>>>>>>> knowledge
> >>>>>>>>>>>>> we can use for
non-trivial cases though.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions,
> which
> >>>>>>>>>>>>>> contradicts with
what LLVM does.
> >>>>>>>>>>>>> As with other
violation of attributes I would, on first
> though,
> >>>>>>>>>>>>> suggest
> >>>>>>>>>>>>> to produce poison, not
UB.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Also, existing
optimizations like loop versioning can
> introduce
> >>>>>>>>>>>>>> ptrtoint/pointer
comparisons too.
> >>>>>>>>>>>>> Sure. I am not certain
why that is a problem. I get the
> feeling things
> >>>>>>>>>>>>> are still mixed up
here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What I proposed is
twofold:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) We stop folding
comparisons between different allocas if
> >>>>>>>>>>>>> changing the
> >>>>>>>>>>>>> address
> >>>>>>>>>>>>>       of both might be
observable. Thus, if both might have
> their
> >>>>>>>>>>>>> address
> >>>>>>>>>>>>>
"taken"/escaped,
> >>>>>>>>>>>>>       other than the
comparisons we want to fold, we cannot
> proceed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2) We define lifetime
markers to mean `memset(undef)`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The first should be
sufficient for the problem you were
> trying to
> >>>>>>>>>>>>> solve
> >>>>>>>>>>>>> in the
> >>>>>>>>>>>>> first place. The
second makes lifetime markers less weird.
> Note
> >>>>>>>>>>>>> that 1)
> >>>>>>>>>>>>> is not changing
> >>>>>>>>>>>>> the semantics of the
IR. We basically just argue there is a
> bug in our
> >>>>>>>>>>>>> instcombine right
> >>>>>>>>>>>>> now as we do not check
all necessary preconditions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I see that there
are other questions that I didn't answer
> yet, but
> >>>>>>>>>>>>>> let
> >>>>>>>>>>> me
> >>>>>>>>>>>>>> answer this first
to limit the length of the text :)
> >>>>>>>>>>>>> Sure, we can split the
discussion :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ~ Johannes
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Juneyoung
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jan 7,
2021 at 3:36 AM Johannes Doerfert <
> >>>>>>>>>>>>> johannesdoerfert at
gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 1/5/21 8:00
PM, Juneyoung Lee wrote:
> >>>>>>>>>>>>>>>> Hi
Johannes,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I read
your proposal and thought about the model.
> >>>>>>>>>>>>>>> Cool, thanks!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As you
concerned in A3, certain programs may be valid
> only when
> >>>>>>>>>>> memory
> >>>>>>>>>>>>>>>> blocks
with overlapping lifetimes have disjoint addresses.
> >>>>>>>>>>>>>>>> Look at
this example (I'm using malloc, but alloca also
> works):
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> p1 =
malloc(4)
> >>>>>>>>>>>>>>>> p2 =
malloc(4) // for brevity, assume that there as
> enough space
> >>>>>>>>>>>>>>>> & p1
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> p2 != null
> >>>>>>>>>>>>>>>>
set<char*> s;
> >>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert did
> nothing, it
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>> be
surprise to programmers
> >>>>>>>>>>>>>>>> work(s);
> >>>>>>>>>>>>>>>>
free(data1)
> >>>>>>>>>>>>>>>>
free(data2)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Clearly,
IR semantics should guarantee that escaped
> blocks are
> >>>>>>>>>>>>> disjoint.
> >>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>> would be
great for verification tools on LLVM IR to be
> able to
> >>>>>>>>>>>>>>>> answer
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> the second
insert will succeed.
> >>>>>>>>>>>>>>> I agree, the
second insert should succeed, assuming `p1 &&
> p2`.
> >>>>>>>>>>>>>>> I don't
think my proposal would in any way impact the
> program above,
> >>>>>>>>>>>>>>> if anything
the A3 reasoning makes sure such a program
> with allocas
> >>>>>>>>>>>>>>> is not
miscompiled.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm also
not sure I understand what you try to argue for.
> Maybe
> >>>>>>>>>>>>>>> elaborate a
bit what it is you think is bad or needs to be
> changed.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The
problem is that definition of escapedness is not
> clear at the
> >>>>>>>>>>>>>>> semantic
> >>>>>>>>>>>>>>>> level.
Describing the IR semantics w.r.t. LLVM's escaped
> analysis
> >>>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>> isn't
something we want.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Semantic
definition of escapedness of a pointer seems
> hard, I mean
> >>>>>>>>>>> in a
> >>>>>>>>>>>>>>>> stepwise
manner.
> >>>>>>>>>>>>>>>> It
isn't a single instruction such as 'escape i8* ptr',
> and we need
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>> over all
instructions in the function. For example,
> '(int)(p+1) -
> >>>>>>>>>>>>> (int)p'
> >>>>>>>>>>>>>>>> isn't
semantically escaping the pointer p because the
> result is 1
> >>>>>>>>>>>>>>>> regardless
of the value of p.
> >>>>>>>>>>>>>>>> Stepwisely
defining the semantics of instructions is a
> desirable
> >>>>>>>>>>>>>>> direction
> >>>>>>>>>>>>>>>> IMO.
> >>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us from
> defining
> >>>>>>>>>>>>>>> the semantics
of instructions, or force us to do it in an
> >>>>>>>>>>>>>>>
"undesirable
> >>>>>>>>>>>>>>> way"?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In
practice, syntactically checking escapedness + nice
> engineering
> >>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>> not break
optimizations in most cases (as undef/poison
> did); but it
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>> be great
to move to another level, since LLVM IR is used
> in so many
> >>>>>>>>>>>>>>> places
> >>>>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>> The property
under which you can coalesce objects is
> simple:
> >>>>>>>>>>>>>>>       It is
not observable.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Now, if the
address of one of the two objects you coalesce
> is not
> >>>>>>>>>>>>>>> observed,
coalescing is not observable. That is a
> sufficient
> >>>>>>>>>>>>>>> condition
> >>>>>>>>>>>>>>> not a
necessary one. Pointer "escaping" is one step
> further. If the
> >>>>>>>>>>>>>>> address
doesn't escape it is not observed. This does not
> mean the
> >>>>>>>>>>>>>>> "semantic
conditions for coalescing", e.g., for the
> purpose of
> >>>>>>>>>>>>> translation
> >>>>>>>>>>>>>>> validation, is
supposed to be build on top of our
> "definition of
> >>>>>>>>>>>>> escaping
> >>>>>>>>>>>>>>>
pointers". That said, we use "does not escape" as a
> precondition for
> >>>>>>>>>>>>>>> various
transformation and I'm unsure what is any
> different now. The
> >>>>>>>>>>>>>>> entire escape
argument is only used in the validity of the
> pointer
> >>>>>>>>>>>>> folding.
> >>>>>>>>>>>>>>> Similarly, we
can fold a comparison of a noalias pointer
> with
> >>>>>>>>>>>>>>> another
> >>>>>>>>>>>>> one
> >>>>>>>>>>>>>>> if the former
does not escape (and both are dereferenced
> and one is
> >>>>>>>>>>>>>>> written for
sure).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The
pointer comparison is another beast. It indeed has a
> few
> >>>>>>>>>>>>>>>> issues,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> solving it
might require nontrivial solution.
> >>>>>>>>>>>>>>> I think the
main problem of the inconsistencies and such
> we've
> >>>>>>>>>>>>>>> seen is
> >>>>>>>>>>>>>>> rooted in the
erroneous folding of pointer comparisons.
> Cleaning up
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> lifetime
marker semantics is actually unrelated and simply
> not
> >>>>>>>>>>>>>>> folding
> >>>>>>>>>>>>>>> as described
in A3 should solve the issue that has been
> reported.
> >>>>>>>>>>>>>>> Nevertheless,
> >>>>>>>>>>>>>>> we should take
a crack at lifetime markers while we are
> here.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ~ Johannes
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Juneyoung
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue,
Jan 5, 2021 at 9:37 AM Johannes Doerfert <
> >>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi
Juneyoung,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Happy
new year :)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> After
we had a lengthy discussion on phab last year,
> I've tried to
> >>>>>>>>>>>>>>>>>
summarize my thoughts,
> >>>>>>>>>>>>>>>>>
especially given that I had some time to think about
> things
> >>>>>>>>>>>>>>>>> over
the
> >>>>>>>>>>>>>>> break.
> >>>>>>>>>>>>>>>>> Still,
no promises on the quality ;)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I
start with general questions I asked myself and then
> go on
> >>>>>>>>>>> rambling
> >>>>>>>>>>>>>>>>> about
a potential design.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Q1: Is
lifetime a given property or a derived one, thus
> is it
> >>>>>>>>>>>>>>>>> fixed
> >>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>
modifiable?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This
is a question I asked myself a lot recently. I
> think it is
> >>>>>>>>>>>>> derived
> >>>>>>>>>>>>>>>>> and
modifiable,
> >>>>>>>>>>>>>>>>> at
least I hope it is. Only that would allow
> transformations I
> >>>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>> us to
do. Here are some examples:
> >>>>>>>>>>>>>>>>>       
https://godbolt.org/z/G8obj3
> >>>>>>>>>>>>>>>>>       
https://godbolt.org/z/obaTc
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Q2: Is
a pointer comparison, or similar use, extending
> the
> >>>>>>>>>>>>>>>>>
lifetime?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Asked
differently, can we hoist a pointer comparison
> into a region
> >>>>>>>>>>>>> where
> >>>>>>>>>>>>>>>>> the
pointer is dead?
> >>>>>>>>>>>>>>>>> This
is an important question which we haven't discussed
> much
> >>>>>>>>>>>>>>>>> as we
> >>>>>>>>>>>>>>>>>
assumed LICM has to work.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The
current behavior is that non-dereferencing uses are
> not
> >>>>>>>>>>> extending
> >>>>>>>>>>>>>>>>> the
lifetime and are
> >>>>>>>>>>>>>>>>>
allowed outside of "lifetime regions" (as indicated by
> markers).
> >>>>>>>>>>> They
> >>>>>>>>>>>>>>>>> will
always produce valid
> >>>>>>>>>>>>>>>>>
results. Though, we might want to think about a lifetime
> marker
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> spits
out a new pointer
> >>>>>>>>>>>>>>>>> value
instead of reusing the old one.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Q3:
Can we use lifetime to argue about addresses?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The
question here is, can we fold address comparisons
> based on
> >>>>>>>>>>>>>>>>>
lifetimes, or not.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> So
far, we fold comparisons based on "address
> information". For
> >>>>>>>>>>>>> example,
> >>>>>>>>>>>>>>>>> we
"know" globals,
> >>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one another.
> Also, two
> >>>>>>>>>>>>> distinct
> >>>>>>>>>>>>>>>>>
allocations, for globals
> >>>>>>>>>>>>>>>>> and
allocas, are considered unequal. Now, the crux is
> that we have
> >>>>>>>>>>> to
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>
consistent if we do two
> >>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number
> missing).
> >>>>>>>>>>>>>>>>> Since
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>
backend (or any other place
> >>>>>>>>>>>>>>>>> for
that matter) might coalesce allocas, their addresses
> might not
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>
different after all. If we
> >>>>>>>>>>>>>>>>>
already folded a comparison to "unequal" we are doomed
> if we later
> >>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> a
comparison that results
> >>>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing rules
> as
> >>>>>>>>>>>>>>>>> they
can
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>
stricter.)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
Design:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I
would hope we can come up with a model that treats
> memory "the
> >>>>>>>>>>>>> same",
> >>>>>>>>>>>>>>>>>
regardless if it is global,
> >>>>>>>>>>>>>>>>> stack,
or heap. I want to avoid special casing one of
> them wrt.
> >>>>>>>>>>>>> lifetime
> >>>>>>>>>>>>>>>>> as I
believe most optimizations
> >>>>>>>>>>>>>>>>> would
apply to any of them, potentially for different
> reasons and
> >>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>
different gains, but nevertheless.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> A1:
Lifetime is a concept that talks about memory
> content *only*.
> >>>>>>>>>>>>>>>>>
Basically, the memory content is set to
> >>>>>>>>>>>>>>>>>       
undefined by lifetime markers. It is
> >>>>>>>>>>>>>>>>>
derived/modifiable as
> >>>>>>>>>>> it
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> just
an "as-is" property of the memory
> >>>>>>>>>>>>>>>>>       
content. The lifetimes of an object, as
> described by
> >>>>>>>>>>> markers,
> >>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>> change
during the compilation. They
> >>>>>>>>>>>>>>>>>       
might become smaller if we deduce the object is
> not
> >>>>>>>>>>>>>>>>>
accessed
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> the
memory content is not used, they
> >>>>>>>>>>>>>>>>>       
might become larger if objects with
> non-overlapping
> >>>>>>>>>>> lifetimes
> >>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
> >>>>>>>>>>>>>>>>>       
introducing a new object though.)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> A2: If
we define lifetime as above, it has nothing to do
> with the
> >>>>>>>>>>>>>>>>>
address of an object. Consequently, pointer
> >>>>>>>>>>>>>>>>>       
comparisons and similar operations are valid
> outside the
> >>>>>>>>>>>>> lifetime.
> >>>>>>>>>>>>>>>>> Loads
and stores are as well, they can
> >>>>>>>>>>>>>>>>>       
even not be removed "easily". A store followed
> by a
> >>>>>>>>>>>>>>>>>
lifetime
> >>>>>>>>>>>>>>> marker
> >>>>>>>>>>>>>>>>> or a
load following a lifetime marker
> >>>>>>>>>>>>>>>>>       
is dead or results in undef respectively.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> A3: We
could not use lifetime to argue about addresses.
> This means
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
> >>>>>>>>>>>>>>>>>       
lifetimes result in different addresses. Thus,
> a
> >>>>>>>>>>>>>>>>>
comparison
> >>>>>>>>>>>>>>> between
> >>>>>>>>>>>>>>>>> the
address of two allocas could not
> >>>>>>>>>>>>>>>>>       
immediately be folded. That said, there would
> be special
> >>>>>>>>>>> cases
> >>>>>>>>>>>>>>>>>
though. Basically, if one of the allocas does
> >>>>>>>>>>>>>>>>>       
not escape, other than the comparisons to be
> folded,
> >>>>>>>>>>>>>>>>> we can
> >>>>>>>>>>>>> fold
> >>>>>>>>>>>>>>>>> them.
Afterwards, coalescing or splitting
> >>>>>>>>>>>>>>>>>       
would still be consistent because it is
> unobservable. The
> >>>>>>>>>>>>> problem
> >>>>>>>>>>>>>>>>> we
have in-tree is that we fold even though
> >>>>>>>>>>>>>>>>>       
the address is still observed (after the fold).
> It is
> >>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>> unclear
> >>>>>>>>>>>>>>>>> to me
what the impact of this would be
> >>>>>>>>>>>>>>>>>       
on real code. I suggested before that we run
> some
> >>>>>>>>>>> experiments
> >>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>> before
we make any decision whatsoever.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This
is pretty much saying that lifetime markers are
> >>>>>>>>>>> `memset(undef)`,
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>> you
suggested before (I think).
> >>>>>>>>>>>>>>>>> There
are some implementation level differences but at
> the end of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> day
they are basically the same.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Happy
to hear some thoughts on this, especially if it
> fixes the
> >>>>>>>>>>>>> problems
> >>>>>>>>>>>>>>>>> that
lead to D93376 in the first place.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ~
Johannes
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On
12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
> >>>>>>>>>>>>>>>>>>
Hello all,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of
> lifetime.start/end
> >>>>>>>>>>> intrinsic
> >>>>>>>>>>>>>>>>> here (
> >>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
> >>>>>>>>>>> (syntactically
> >>>>>>>>>>>>> &
> >>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1.
Is there a frontend or framework that introduces
> lifetime call
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>
non-stack allocated objects?
> >>>>>>>>>>>>>>>>>> If
exists, which behavior do they expect from it?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 2.
Can a block's lifetime be started bytewise or
> elementwise?
> >>>>>>>>>>>>>>>>>> I
imagine an optimization that allow using stack very
> compactly,
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> wonder
> >>>>>>>>>>>>>>>>>>
there is a real-world use case.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
Thanks,
> >>>>>>>>>>>>>>>>>>
Juneyoung
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
_______________________________________________
> >>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
> >>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
> >>>>>>>>>>>>>>>>>>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
>
> --
> Website: https://people.mpi-sws.org/~jung/
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210126/3fbe2754/attachment-0001.html>

Johannes Doerfert via llvm-dev

2021-Jan-26 14:59 UTC

head link

[llvm-dev] lifetime.start/end

On 1/26/21 4:22 AM, Ralf Jung wrote:> Hi Johannes,
>
>> On 1/24/21 10:07 AM, Ralf Jung wrote:
>>> Hi Johannes,
>>>
>>>> At some point I lost track of what you even want to achieve.
>>>
>>> *lol* fair.^^
>>> All I want is a precise and meaningful spec for lifetime.start and 
>>> lifetime.end. That's where we entered this rabbit hole. You
didn't
>>> like Juneyoung's and Nuno's proposal for this and instead
suggested
>>> to replace the entire system by something else, and that new system
>>> is what we are discussing now (that is my understanding, anyway).
>>>
>> I went back to the proposal in question, 
>> https://reviews.llvm.org/D93376, and I could not right away determine 
>> why this would avoid the inconsistency we can imagine for something
like
>>
>> ```
>> int *G = 0;
>> void bar(int *p) __noinline__ { if (G) print(G == p); else G = p;
>> void foo() {
>>    intptr_t xp, yp;
>>    { int x; xp = &x; bar(&x); }
>>    { int y; yp = &y; bar(&y); }
>>    print(xp == yp);               // One can imagine this is folded 
>> to false and because x and y are coalesced bar will print true.
>> }
>>
>> ```
>>
>> I was opposing the proposal because of the brittle syntactic 
>> restrictions. In my attempt to justify not having those I also tried 
>> to resolve the above inconsistency in a more general manner because I 
>> thought that was what it was all about. I believe that we either have 
>> to argue the address of a dead object is unusable/undetermined, 
>> basically the direction of James comment, or we need to decouple 
>> lifetime markers from locations.
>
> Folding to false and coalescing are inconsistent with each other, yes. 
> It is just wrong to perform both of these on the above program, I 
> would say (if we interpret it with LLVM semantics, which I assume has 
> no "pointer zapping"). Is that up for debate?
>
> However, I do not see the immediate connection to lifetime markers -- 
> the same would be true for a program using malloc/free. Coalescing is 
> permissible whenever lifetimes are disjoint. Folding to false is only 
> permissible if the pointers are *definitely not equal*, e.g. if their 
> lifetimes overlap.
>I still believe folding can be done regardless of their lifetime, as 
long as it is done consistent. Fold all pointer observations and the actual
placement is not observable, we went down that road.

I agree with the malloc/free comparison, I did argue that the entire 
time, or I tried.

If there is no/little connection to lifetime markers, I assume 
https://reviews.llvm.org/D93376 did not fix the inconsistency then. It was,
apparently, just an approach to clarify what a lifetime marker is, 
right? I might be more confused than before. I believe you should equate
lifetime markers and memset(undef) and put the entire discussion about 
folding/coalescing onto a foundation that is unrelated to lifetime
markers.


>>> Sure, everything in the semantics is up to what the program
actually
>>> observes. But I was considering the general case, in which it will 
>>> not be possible to prove that some address is not observed. I
expect
>>> that to be quite common (certainly, Rust programs really like to 
>>> take references to local variables and pass them to other 
>>> functions), and in those cases, not having lifetime.start/end will 
>>> severely restrict program transformations -- as you observed, even 
>>> outlining is affected.
>>>
>>> This is basically the abstract version of the concrete concern that
>>> James raised.
>>
>> I get the argument that it can hinder optimizations but it is 
>> abstract. It seems pretty simple to test this out, I mean to restrict 
>> the folding and the coalescing. The benefit, which I think got 
>> ignored here a lot, is that it provides a clear path towards 
>> deduction of coalescing opportunities, that is, coalescing in the 
>> middle-end or backend w/o lifetime markers. Though, similar to the 
>> impact of my proposal, I have no idea if that is even needed.
>
> It is certainly not simple for me to test out anything that would 
> involve changing LLVM. ;)
@Juneyoung is great at doing that ;)

> The syntactic condition also provides a clear path to evaluate whether 
> any given folding or coalescing is correct. 
I disagree. The syntactic conditions, at least as I read them, allow the 
above program with lifetime markers (as we have it now) but did not
suggest it would be wrong to fold and later coalesce. Or maybe I missed 
why the syntactic restrictions would prevent that, please explain then.

> Any sufficiently precise semantics will do that. The question is just 
> which semantics y'all want. If the semantics require "pointer
zapping"
> to remain sufficiently optimizable, that sounds like a serious 
> drawback to me.
I don't think anyone wanted that. TBH, I think my proposal is by far the 
most compatible one wrt. the "classic" model w/o lifetime markers,
given
that I proposed not to entangle them in object allocation. Everything 
else I proposed are "as-if" transformations. Do you disagree?

~ Johannes


>
> Kind regards,
> Ralf
>
>>
>> ~ Johannes
>>
>>
>>>
>>> Kind regards,
>>> Ralf
>>>
>>>> A lot of my arguments around alloca/malloc/... is that we can 
>>>> augment the information from the frontend by determining if an 
>>>> address is observed or not. If it is not, we can limit its 
>>>> allocation, coalesce it, etc. Lifetime markers can help us do
this
>>>> but they are not the only way.
>>>> So, basically, we do not need lifetime markers to coalesce 
>>>> alloca/malloc/... but that they can help.
>>>
>>>
>>>
>>>>
>>>> Let's try to get out of this rabbit hole. If you have a
problem
>>>> with the proposal as outlined in 
>>>>
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147594.html
>>>> , please let me know. I assume dealing with observed addresses 
>>>> different than with non-observede ones (implied by noescape) is
the
>>>> way forward now.
>>>>
>>>> ~ Johannes
>>>>
>>>>
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Ralf
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ralf
>>>>>>>
>>>>>>>>
>>>>>>>> Does that make some sense?
>>>>>>>>
>>>>>>>> ~ Johannes
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Ralf
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/7/21 10:12 PM, Juneyoung Lee
wrote:
>>>>>>>>>>> On Fri, Jan 8, 2021 at 8:54 AM
Johannes Doerfert
>>>>>>>>>>> <johannesdoerfert at
gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 1/7/21 4:55 PM, Juneyoung
Lee wrote:
>>>>>>>>>>>>> It is, alloca must have its
integral address decided at
>>>>>>>>>>>>> its *allocation*
>>>>>>>>>>>>> time as well, like
0xFF00... .
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, if 0xFFFF0000
~ 0xFF000000 is already
>>>>>>>>>>>>> allocated, the new
>>>>>>>>>>>>> alloca cannot be placed
there unless it is not observed in
>>>>>>>>>>>>> the future,
>>>>>>>>>>>>> according to your proposal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But how do we know from the
current state whether the
>>>>>>>>>>>>> stack variable is
>>>>>>>>>>>>> going to be observed or
not?
>>>>>>>>>>>> With that argument you could
allocate all but 4 bytes, do a
>>>>>>>>>>>> malloc(4)
>>>>>>>>>>>> and know the address of the
returned pointer (assuming it
>>>>>>>>>>>> is not null).
>>>>>>>>>>>>
>>>>>>>>>>>> What I try to say is, either
your scenario is part of the
>>>>>>>>>>>> model and
>>>>>>>>>>>> everything we do is broken as
you could "observe" addresses
>>>>>>>>>>>> passively,
>>>>>>>>>>>> *or*, the abstract machine we
use for semantic reasoning
>>>>>>>>>>>> doesn't permit
>>>>>>>>>>>> the above reasoning. I really
hope for the latter.
>>>>>>>>>>>>
>>>>>>>>>>> That's a very valid point..!
>>>>>>>>>>>
>>>>>>>>>>> Actually, I have a memory model
that addresses this problem.
>>>>>>>>>>> The gist is that we can interpret
each allocation
>>>>>>>>>>> instruction as *creating
>>>>>>>>>>> 2 blocks* and nondeterministically
returning one of them.
>>>>>>>>>>> The other one is marked as invalid
but not freed
>>>>>>>>>>> immediately. Deallocation
>>>>>>>>>>> frees both blocks.
>>>>>>>>>>> If there is not enough space for
having two blocks, it is
>>>>>>>>>>> treated as
>>>>>>>>>>> out-of-memory.
>>>>>>>>>>>
>>>>>>>>>>> It makes guessing the address of an
allocation according to
>>>>>>>>>>> memory layout
>>>>>>>>>>> invalid UB.
>>>>>>>>>>>
>>>>>>>>>>> p = malloc(4)
>>>>>>>>>>> // If p != null, it is guaranteed
there was at least two 4
>>>>>>>>>>> byte slots
>>>>>>>>>>> available, say 0x100~0x104 and
0x110~0x114.
>>>>>>>>>>> // Two blocks are allocated at
0x110 and 0x114, and one of
>>>>>>>>>>> the addresses is
>>>>>>>>>>> nondeterministically returned.
>>>>>>>>>>> // By the nature of nondeterminism,
all executions below
>>>>>>>>>>> should be
>>>>>>>>>>> well-defined regardless of the
address of p.
>>>>>>>>>>> *(int*)0x100 = 10; // In an
execution where p is 0x110, this
>>>>>>>>>>> raises UB.
>>>>>>>>>>>
>>>>>>>>>>> The paper link is here 
>>>>>>>>>>>
<https://dl.acm.org/doi/10.1145/3276495> FYI.
>>>>>>>>>>
>>>>>>>>>> Nice! Thanks for the link :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> To argue differently: Who is to say
there is a stack, or
>>>>>>>>>>> only one,
>>>>>>>>>>>> or that alloca allocates memory
"on the one stack"? That is
>>>>>>>>>>>> not part of
>>>>>>>>>>>> the IR, IMHO. I can write a
machine on which alloca lowers
>>>>>>>>>>>> to malloc,
>>>>>>>>>>>> I don't even need the free
during stack unwind but that I
>>>>>>>>>>>> could do as
>>>>>>>>>>>> well if I wanted to.
>>>>>>>>>>>
>>>>>>>>>>> This is right, the example was for
illustrative purposes. IR
>>>>>>>>>>> does not
>>>>>>>>>>> enforce alloca to be placed at
'stack'.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Making icmp/ptrtoint yield
poison will still make loop
>>>>>>>>>>>>> versioning or
>>>>>>>>>>>>> pointer rewriting
transformations unsound because these
>>>>>>>>>>>>> operations now
>>>>>>>>>>>> can
>>>>>>>>>>>>> create poison (even if
pointers are noundef).
>>>>>>>>>>>> I did not say they yield
poison, at least I did not try to
>>>>>>>>>>>> say that.
>>>>>>>>>>>> What are you referring to
exactly?
>>>>>>>>>>>>
>>>>>>>>>>> I was referring to this:
>>>>>>>>>>>
>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising instructions,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> contradicts with what LLVM
does.
>>>>>>>>>>>> As with other violation of
attributes I would, on first
>>>>>>>>>>>> though, suggest
>>>>>>>>>>>> to produce poison, not UB.
>>>>>>>>>>> But it is more about the
(imaginary) attribute, so maybe I
>>>>>>>>>>> was slightly out
>>>>>>>>>>> of topic.
>>>>>>>>>>>
>>>>>>>>>>>> 2) is fine, I think the
suggestion semantically makes sense
>>>>>>>>>>>> perfectly. 1)
>>>>>>>>>>>>> is something I'm
concerned about now.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are more than pointer
foldings, such as rewriting
>>>>>>>>>>>>> such expression,
>>>>>>>>>>>>> code motion ptr cmp,
introduce ptr cmp, etc. There is also
>>>>>>>>>>>>> analysis
>>>>>>>>>>>> relying
>>>>>>>>>>>>> on ptr cmp.
>>>>>>>>>>>>> Defining the correctness of
each of them is something we
>>>>>>>>>>>>> want to avoid,
>>>>>>>>>>>> and
>>>>>>>>>>>>> maybe that's why we
want to define precise semantics for
>>>>>>>>>>>>> things.
>>>>>>>>>>>> I don't get the point. My
proposal does not change the
>>>>>>>>>>>> semantics of
>>>>>>>>>>>> pointer comparisons, at all. I
explicitly mentioned that in
>>>>>>>>>>>> the last
>>>>>>>>>>>> email.
>>>>>>>>>>>>
>>>>>>>>>>> Oh okay, I thought it was a part of
the lifetime proposal,
>>>>>>>>>>> but it seems
>>>>>>>>>>> more like a separate thing.
>>>>>>>>>>> I agree that this requires
performance impact.
>>>>>>>>>>> Also investigation of existing
transformations would be
>>>>>>>>>>> needed; Alive2's
>>>>>>>>>>> pointer comparison is doing
approximation yet, but if it
>>>>>>>>>>> becomes fully
>>>>>>>>>>> precise, it will show something
from running LLVM unit tests
>>>>>>>>>>> I believe..! :)
>>>>>>>>>>>
>>>>>>>>>>>> I think this will be less
aggressive and may give nice
>>>>>>>>>>>> feedback to
>>>>>>>>>>>>> potential projects that are
using lifetime with non-alloca.
>>>>>>>>>>>> The lifetime marker debate,
basically 2) above, is
>>>>>>>>>>>> orthogonal to the
>>>>>>>>>>>> problem
>>>>>>>>>>>> you try to solve. It got mixed
in as lifetime markers were
>>>>>>>>>>>> used by
>>>>>>>>>>>> StackColoring
>>>>>>>>>>>> to perform coalescing but that
is coincidental. You can
>>>>>>>>>>>> (arguably)
>>>>>>>>>>>> coalesce stack
>>>>>>>>>>>> allocations regardless of
lifetime markers and with 1) such a
>>>>>>>>>>>> transformation
>>>>>>>>>>>> (w/ and w/o lifetime markers)
would actually be sound.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> At the end, we can
implement IR writer that lowers
>>>>>>>>>>>>> lifetime with
>>>>>>>>>>>> non-alloca
>>>>>>>>>>>>> into memset(undef). WDYT?
>>>>>>>>>>>> Yeah, 2) is orthogonal and we
can lower it that way. Unsure
>>>>>>>>>>>> if it is
>>>>>>>>>>>> helpful
>>>>>>>>>>>> but we can certainly define it
that way in the LangRef.
>>>>>>>>>>>>
>>>>>>>>>>> Okay, thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>
>>>>>>>>>>>>> p.s. The reply was late,
sorry. I think I can spend more
>>>>>>>>>>>>> time on this
>>>>>>>>>>>> today.
>>>>>>>>>>>>> On Thu, Jan 7, 2021 at 9:02
AM Johannes Doerfert <
>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/6/21 4:33 PM,
Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>> Stepwisely
defining the semantics of instructions is a
>>>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us
>>>>>>>>>>>>>>>> from defining
>>>>>>>>>>>>>>>> the semantics
of instructions, or force us to do it in
>>>>>>>>>>>>>>>> an
"undesirable
>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I meant it would be
great if the output state after
>>>>>>>>>>>>>>> executing an
>>>>>>>>>>>>>>> instruction can be
described using its input state.
>>>>>>>>>>>>>>> (that was the
meaning of 'stepwise semantics', I should
>>>>>>>>>>>>>>> have been more
>>>>>>>>>>>>>>> clear about this)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For example, the
semantics of 'z = add x y' can be
>>>>>>>>>>>>>>> defined as follows:
>>>>>>>>>>>>>>> Given an input
state s, next state s' = s[z -> s(x) + s(y)]
>>>>>>>>>>>>>>> where s(x) is the
value of x in the previous state, and
>>>>>>>>>>>>>>> s[z -> v] is a
>>>>>>>>>>>>>>> state with z
updated to v.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Another example
that involves memory: the semantics of
>>>>>>>>>>>>>>> 'i = ptrtoint
p'
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> be defined as
follows:
>>>>>>>>>>>>>>> Given an input
state s, next state s' = s[i ->
>>>>>>>>>>>>>>> s(p).obj.address +
>>>>>>>>>>>>>>> s(p).offset]
>>>>>>>>>>>>>>> where obj.address
is the begin address of a memory
>>>>>>>>>>>>>>> object obj pointed
>>>>>>>>>>>> by
>>>>>>>>>>>>>> p
>>>>>>>>>>>>>>> & offset is
p's byte offset. (Imagine a pointer to the
>>>>>>>>>>>>>>> offset of some
>>>>>>>>>>>>>> char
>>>>>>>>>>>>>>> array).
>>>>>>>>>>>>>>> Note that ptrtoint
& add can be nicely defined w.r.t the
>>>>>>>>>>>>>>> input state.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now, the
instruction that we're facing is 'p = alloca'.
>>>>>>>>>>>>>>> To describe the
output state after executing 'p =
>>>>>>>>>>>>>>> alloca', the
address
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> new alloca should
be determined.
>>>>>>>>>>>>>>> If observedness is
involved, we need to know the future
>>>>>>>>>>>>>>> state again. :/
>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>> don't know
whether the alloca is going to be observed or
>>>>>>>>>>>>>>> not without
>>>>>>>>>>>>>> seeing
>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>> This is the problem
of the current lifetime intrinsics
>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>> No, you mix things up
here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nobody proposed to
modify the semantics of `alloca`.
>>>>>>>>>>>>>> `alloca` provides you
with a fresh, unobserved block of
>>>>>>>>>>>>>> dereferenceable memory
that is implicitly freed as the stack
>>>>>>>>>>>>>> unwinds. That is it. No
context necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you want to modify
the IR, you need to argue the
>>>>>>>>>>>>>> observable
>>>>>>>>>>>>>> semantics which is
nothing new. That this might require
>>>>>>>>>>>>>> more than
>>>>>>>>>>>>>> a peephole view of the
program is also not new.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One possible
approach to resolve this is adding an
>>>>>>>>>>>>>>>
'unobserved' flag to
>>>>>>>>>>>>>>> alloca instruction
(similar to what was suggested by
>>>>>>>>>>>>>>> Nicolai).
>>>>>>>>>>>>>>> And, we can say
that if alloca with 'unobserved' is used by
>>>>>>>>>>>>>> ptrtoint/icmp,
>>>>>>>>>>>>>>> it is UB.
>>>>>>>>>>>>>> The flag can be added,
like we add other attributes. It
>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>> be required for any
optimization we talked about though.
>>>>>>>>>>>>>> It basically
>>>>>>>>>>>>>> is a way to manifest
derived or given information into
>>>>>>>>>>>>>> the IR.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Attribute deduction, as
well as frontends with domain
>>>>>>>>>>>>>> knowledge,
>>>>>>>>>>>>>> can add such
information. The flag we discussed in phab
>>>>>>>>>>>>>> was not even
>>>>>>>>>>>>>> sufficient for all the
transformation examples I
>>>>>>>>>>>>>> presented in my mail,
>>>>>>>>>>>>>> that is why I extended
my  argument. We could still have
>>>>>>>>>>>>>> a "noescape"
>>>>>>>>>>>>>> flag for allocas, but
I'm not sure how useful that really
>>>>>>>>>>>>>> is. We can
>>>>>>>>>>>>>> certainly deduce it and
manifest it, unsure if we have
>>>>>>>>>>>>>> domain knowledge
>>>>>>>>>>>>>> we can use for
non-trivial cases though.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But this makes
ptrtoint/icmp make UB-raising
>>>>>>>>>>>>>>> instructions, which
>>>>>>>>>>>>>>> contradicts with
what LLVM does.
>>>>>>>>>>>>>> As with other violation
of attributes I would, on first
>>>>>>>>>>>>>> though, suggest
>>>>>>>>>>>>>> to produce poison, not
UB.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, existing
optimizations like loop versioning can
>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>> ptrtoint/pointer
comparisons too.
>>>>>>>>>>>>>> Sure. I am not certain
why that is a problem. I get the
>>>>>>>>>>>>>> feeling things
>>>>>>>>>>>>>> are still mixed up
here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What I proposed is
twofold:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) We stop folding
comparisons between different allocas
>>>>>>>>>>>>>> if changing the
>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>       of both might be
observable. Thus, if both might
>>>>>>>>>>>>>> have their address
>>>>>>>>>>>>>>
"taken"/escaped,
>>>>>>>>>>>>>>       other than the
comparisons we want to fold, we
>>>>>>>>>>>>>> cannot proceed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) We define lifetime
markers to mean `memset(undef)`.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The first should be
sufficient for the problem you were
>>>>>>>>>>>>>> trying to solve
>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>> first place. The second
makes lifetime markers less
>>>>>>>>>>>>>> weird. Note that 1)
>>>>>>>>>>>>>> is not changing
>>>>>>>>>>>>>> the semantics of the
IR. We basically just argue there is
>>>>>>>>>>>>>> a bug in our
>>>>>>>>>>>>>> instcombine right
>>>>>>>>>>>>>> now as we do not check
all necessary preconditions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see that there
are other questions that I didn't
>>>>>>>>>>>>>>> answer yet, but let
>>>>>>>>>>>> me
>>>>>>>>>>>>>>> answer this first
to limit the length of the text :)
>>>>>>>>>>>>>> Sure, we can split the
discussion :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 7, 2021
at 3:36 AM Johannes Doerfert <
>>>>>>>>>>>>>> johannesdoerfert at
gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 1/5/21 8:00
PM, Juneyoung Lee wrote:
>>>>>>>>>>>>>>>>> Hi
Johannes,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I read your
proposal and thought about the model.
>>>>>>>>>>>>>>>> Cool, thanks!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As you
concerned in A3, certain programs may be valid
>>>>>>>>>>>>>>>>> only when
>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> blocks with
overlapping lifetimes have disjoint
>>>>>>>>>>>>>>>>> addresses.
>>>>>>>>>>>>>>>>> Look at
this example (I'm using malloc, but alloca
>>>>>>>>>>>>>>>>> also
works):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> p1 =
malloc(4)
>>>>>>>>>>>>>>>>> p2 =
malloc(4) // for brevity, assume that there as
>>>>>>>>>>>>>>>>> enough
space & p1
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> p2 != null
>>>>>>>>>>>>>>>>>
set<char*> s;
>>>>>>>>>>>>>>>>>
s.insert(p1); s.insert(p2); // If the second insert
>>>>>>>>>>>>>>>>> did
nothing, it
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> be surprise
to programmers
>>>>>>>>>>>>>>>>> work(s);
>>>>>>>>>>>>>>>>> free(data1)
>>>>>>>>>>>>>>>>> free(data2)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Clearly, IR
semantics should guarantee that escaped
>>>>>>>>>>>>>>>>> blocks are
>>>>>>>>>>>>>> disjoint.
>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>> would be
great for verification tools on LLVM IR to be
>>>>>>>>>>>>>>>>> able to
answer
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> the second
insert will succeed.
>>>>>>>>>>>>>>>> I agree, the
second insert should succeed, assuming `p1
>>>>>>>>>>>>>>>> && p2`.
>>>>>>>>>>>>>>>> I don't
think my proposal would in any way impact the
>>>>>>>>>>>>>>>> program above,
>>>>>>>>>>>>>>>> if anything the
A3 reasoning makes sure such a program
>>>>>>>>>>>>>>>> with allocas
>>>>>>>>>>>>>>>> is not
miscompiled.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm also
not sure I understand what you try to argue
>>>>>>>>>>>>>>>> for. Maybe
>>>>>>>>>>>>>>>> elaborate a bit
what it is you think is bad or needs to
>>>>>>>>>>>>>>>> be changed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The problem
is that definition of escapedness is not
>>>>>>>>>>>>>>>>> clear at
the
>>>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>>>> level.
Describing the IR semantics w.r.t. LLVM's
>>>>>>>>>>>>>>>>> escaped
analysis
>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>> isn't
something we want.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Semantic
definition of escapedness of a pointer seems
>>>>>>>>>>>>>>>>> hard, I
mean
>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>> stepwise
manner.
>>>>>>>>>>>>>>>>> It
isn't a single instruction such as 'escape i8*
>>>>>>>>>>>>>>>>> ptr',
and we need
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>> over all
instructions in the function. For example,
>>>>>>>>>>>>>>>>>
'(int)(p+1) -
>>>>>>>>>>>>>> (int)p'
>>>>>>>>>>>>>>>>> isn't
semantically escaping the pointer p because the
>>>>>>>>>>>>>>>>> result is 1
>>>>>>>>>>>>>>>>> regardless
of the value of p.
>>>>>>>>>>>>>>>>> Stepwisely
defining the semantics of instructions is a
>>>>>>>>>>>>>>>>> desirable
>>>>>>>>>>>>>>>> direction
>>>>>>>>>>>>>>>>> IMO.
>>>>>>>>>>>>>>>> I'm
confused. What in the proposal would prevent us
>>>>>>>>>>>>>>>> from defining
>>>>>>>>>>>>>>>> the semantics
of instructions, or force us to do it in
>>>>>>>>>>>>>>>> an
"undesirable
>>>>>>>>>>>>>>>> way"?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In
practice, syntactically checking escapedness + nice
>>>>>>>>>>>>>>>>> engineering
>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>> not break
optimizations in most cases (as undef/poison
>>>>>>>>>>>>>>>>> did); but
it
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> be great to
move to another level, since LLVM IR is
>>>>>>>>>>>>>>>>> used in so
many
>>>>>>>>>>>>>>>> places
>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>> The property
under which you can coalesce objects is
>>>>>>>>>>>>>>>> simple:
>>>>>>>>>>>>>>>>       It is not
observable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now, if the
address of one of the two objects you
>>>>>>>>>>>>>>>> coalesce is not
>>>>>>>>>>>>>>>> observed,
coalescing is not observable. That is a
>>>>>>>>>>>>>>>> sufficient
condition
>>>>>>>>>>>>>>>> not a necessary
one. Pointer "escaping" is one step
>>>>>>>>>>>>>>>> further. If the
>>>>>>>>>>>>>>>> address
doesn't escape it is not observed. This does
>>>>>>>>>>>>>>>> not mean the
>>>>>>>>>>>>>>>> "semantic
conditions for coalescing", e.g., for the
>>>>>>>>>>>>>>>> purpose of
>>>>>>>>>>>>>> translation
>>>>>>>>>>>>>>>> validation, is
supposed to be build on top of our
>>>>>>>>>>>>>>>>
"definition of
>>>>>>>>>>>>>> escaping
>>>>>>>>>>>>>>>> pointers".
That said, we use "does not escape" as a
>>>>>>>>>>>>>>>> precondition
for
>>>>>>>>>>>>>>>> various
transformation and I'm unsure what is any
>>>>>>>>>>>>>>>> different now.
The
>>>>>>>>>>>>>>>> entire escape
argument is only used in the validity of
>>>>>>>>>>>>>>>> the pointer
>>>>>>>>>>>>>> folding.
>>>>>>>>>>>>>>>> Similarly, we
can fold a comparison of a noalias
>>>>>>>>>>>>>>>> pointer with
another
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> if the former
does not escape (and both are
>>>>>>>>>>>>>>>> dereferenced
and one is
>>>>>>>>>>>>>>>> written for
sure).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The pointer
comparison is another beast. It indeed has
>>>>>>>>>>>>>>>>> a few
issues,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> solving it
might require nontrivial solution.
>>>>>>>>>>>>>>>> I think the
main problem of the inconsistencies and
>>>>>>>>>>>>>>>> such we've
seen is
>>>>>>>>>>>>>>>> rooted in the
erroneous folding of pointer comparisons.
>>>>>>>>>>>>>>>> Cleaning up
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> lifetime marker
semantics is actually unrelated and
>>>>>>>>>>>>>>>> simply not
folding
>>>>>>>>>>>>>>>> as described in
A3 should solve the issue that has been
>>>>>>>>>>>>>>>> reported.
>>>>>>>>>>>>>>>> Nevertheless,
>>>>>>>>>>>>>>>> we should take
a crack at lifetime markers while we are
>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Juneyoung
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jan
5, 2021 at 9:37 AM Johannes Doerfert <
>>>>>>>>>>>>>>>>
johannesdoerfert at gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi
Juneyoung,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Happy
new year :)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> After
we had a lengthy discussion on phab last year,
>>>>>>>>>>>>>>>>>>
I've tried to
>>>>>>>>>>>>>>>>>>
summarize my thoughts,
>>>>>>>>>>>>>>>>>>
especially given that I had some time to think about
>>>>>>>>>>>>>>>>>> things
over the
>>>>>>>>>>>>>>>> break.
>>>>>>>>>>>>>>>>>> Still,
no promises on the quality ;)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I start
with general questions I asked myself and
>>>>>>>>>>>>>>>>>> then go
on
>>>>>>>>>>>> rambling
>>>>>>>>>>>>>>>>>> about a
potential design.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Q1: Is
lifetime a given property or a derived one,
>>>>>>>>>>>>>>>>>> thus is
it fixed
>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>
modifiable?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is
a question I asked myself a lot recently. I
>>>>>>>>>>>>>>>>>> think
it is
>>>>>>>>>>>>>> derived
>>>>>>>>>>>>>>>>>> and
modifiable,
>>>>>>>>>>>>>>>>>> at
least I hope it is. Only that would allow
>>>>>>>>>>>>>>>>>>
transformations I would
>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>> us to
do. Here are some examples:
>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/G8obj3
>>>>>>>>>>>>>>>>>>
https://godbolt.org/z/obaTc
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Q2: Is
a pointer comparison, or similar use,
>>>>>>>>>>>>>>>>>>
extending the lifetime?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Asked
differently, can we hoist a pointer comparison
>>>>>>>>>>>>>>>>>> into a
region
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>> the
pointer is dead?
>>>>>>>>>>>>>>>>>> This is
an important question which we haven't
>>>>>>>>>>>>>>>>>>
discussed much as we
>>>>>>>>>>>>>>>>>> assumed
LICM has to work.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
current behavior is that non-dereferencing uses
>>>>>>>>>>>>>>>>>> are not
>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>> the
lifetime and are
>>>>>>>>>>>>>>>>>> allowed
outside of "lifetime regions" (as indicated
>>>>>>>>>>>>>>>>>> by
markers).
>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>> will
always produce valid
>>>>>>>>>>>>>>>>>>
results. Though, we might want to think about a
>>>>>>>>>>>>>>>>>>
lifetime marker that
>>>>>>>>>>>>>>>>>> spits
out a new pointer
>>>>>>>>>>>>>>>>>> value
instead of reusing the old one.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Q3: Can
we use lifetime to argue about addresses?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
question here is, can we fold address comparisons
>>>>>>>>>>>>>>>>>> based
on
>>>>>>>>>>>>>>>>>>
lifetimes, or not.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So far,
we fold comparisons based on "address
>>>>>>>>>>>>>>>>>>
information". For
>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>> we
"know" globals,
>>>>>>>>>>>>>>>>>>
allocas, and mallocs cannot be equal to one another.
>>>>>>>>>>>>>>>>>> Also,
two
>>>>>>>>>>>>>> distinct
>>>>>>>>>>>>>>>>>>
allocations, for globals
>>>>>>>>>>>>>>>>>> and
allocas, are considered unequal. Now, the crux is
>>>>>>>>>>>>>>>>>> that we
have
>>>>>>>>>>>> to
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>
consistent if we do two
>>>>>>>>>>>>>>>>>>
comparisons, and, as of now, we are not (bug number
>>>>>>>>>>>>>>>>>>
missing). Since
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> backend
(or any other place
>>>>>>>>>>>>>>>>>> for
that matter) might coalesce allocas, their
>>>>>>>>>>>>>>>>>>
addresses might not
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>
different after all. If we
>>>>>>>>>>>>>>>>>> already
folded a comparison to "unequal" we are
>>>>>>>>>>>>>>>>>> doomed
if we later
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>> a
comparison that results
>>>>>>>>>>>>>>>>>> in
"equal". (Note, this is different from aliasing
>>>>>>>>>>>>>>>>>> rules
as they can
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>
stricter.)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Design:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I would
hope we can come up with a model that treats
>>>>>>>>>>>>>>>>>> memory
"the
>>>>>>>>>>>>>> same",
>>>>>>>>>>>>>>>>>>
regardless if it is global,
>>>>>>>>>>>>>>>>>> stack,
or heap. I want to avoid special casing one of
>>>>>>>>>>>>>>>>>> them
wrt.
>>>>>>>>>>>>>> lifetime
>>>>>>>>>>>>>>>>>> as I
believe most optimizations
>>>>>>>>>>>>>>>>>> would
apply to any of them, potentially for different
>>>>>>>>>>>>>>>>>> reasons
and
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>
different gains, but nevertheless.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
Proposal (largely based on the conversation in phab):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A1:
Lifetime is a concept that talks about memory
>>>>>>>>>>>>>>>>>> content
*only*.
>>>>>>>>>>>>>>>>>>
Basically, the memory content is set to
>>>>>>>>>>>>>>>>>>
         undefined by lifetime markers. It is
>>>>>>>>>>>>>>>>>>
derived/modifiable as
>>>>>>>>>>>> it
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> just an
"as-is" property of the memory
>>>>>>>>>>>>>>>>>>
         content. The lifetimes of an object, as
>>>>>>>>>>>>>>>>>>
described by
>>>>>>>>>>>> markers,
>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>> change
during the compilation. They
>>>>>>>>>>>>>>>>>>
         might become smaller if we deduce the object
>>>>>>>>>>>>>>>>>> is not
accessed
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> the
memory content is not used, they
>>>>>>>>>>>>>>>>>>
         might become larger if objects with
>>>>>>>>>>>>>>>>>>
non-overlapping
>>>>>>>>>>>> lifetimes
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>
coalesced. (One could see the latter as
>>>>>>>>>>>>>>>>>>
         introducing a new object though.)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A2: If
we define lifetime as above, it has nothing to
>>>>>>>>>>>>>>>>>> do with
the
>>>>>>>>>>>>>>>>>> address
of an object. Consequently, pointer
>>>>>>>>>>>>>>>>>>
         comparisons and similar operations are valid
>>>>>>>>>>>>>>>>>> outside
the
>>>>>>>>>>>>>> lifetime.
>>>>>>>>>>>>>>>>>> Loads
and stores are as well, they can
>>>>>>>>>>>>>>>>>>
         even not be removed "easily". A store
>>>>>>>>>>>>>>>>>>
followed by a lifetime
>>>>>>>>>>>>>>>> marker
>>>>>>>>>>>>>>>>>> or a
load following a lifetime marker
>>>>>>>>>>>>>>>>>>
         is dead or results in undef respectively.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A3: We
could not use lifetime to argue about
>>>>>>>>>>>>>>>>>>
addresses. This means
>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>
could/should also not argue that overlapping
>>>>>>>>>>>>>>>>>>
         lifetimes result in different addresses.
>>>>>>>>>>>>>>>>>> Thus, a
comparison
>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>> the
address of two allocas could not
>>>>>>>>>>>>>>>>>>
         immediately be folded. That said, there
>>>>>>>>>>>>>>>>>> would
be special
>>>>>>>>>>>> cases
>>>>>>>>>>>>>>>>>> though.
Basically, if one of the allocas does
>>>>>>>>>>>>>>>>>>
         not escape, other than the comparisons to be
>>>>>>>>>>>>>>>>>> folded,
we can
>>>>>>>>>>>>>> fold
>>>>>>>>>>>>>>>>>> them.
Afterwards, coalescing or splitting
>>>>>>>>>>>>>>>>>>
         would still be consistent because it is
>>>>>>>>>>>>>>>>>>
unobservable. The
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>> we have
in-tree is that we fold even though
>>>>>>>>>>>>>>>>>>
         the address is still observed (after the
>>>>>>>>>>>>>>>>>> fold).
It is still
>>>>>>>>>>>>>>>> unclear
>>>>>>>>>>>>>>>>>> to me
what the impact of this would be
>>>>>>>>>>>>>>>>>>
         on real code. I suggested before that we run
>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>> before
we make any decision whatsoever.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is
pretty much saying that lifetime markers are
>>>>>>>>>>>> `memset(undef)`,
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> you
suggested before (I think).
>>>>>>>>>>>>>>>>>> There
are some implementation level differences but
>>>>>>>>>>>>>>>>>> at the
end of
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> day
they are basically the same.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Happy
to hear some thoughts on this, especially if it
>>>>>>>>>>>>>>>>>> fixes
the
>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>> that
lead to D93376 in the first place.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ~
Johannes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On
12/18/20 2:42 AM, Juneyoung Lee via llvm-dev wrote:
>>>>>>>>>>>>>>>>>>>
Hello all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
We're discussing the well-formedness of
>>>>>>>>>>>>>>>>>>>
lifetime.start/end
>>>>>>>>>>>> intrinsic
>>>>>>>>>>>>>>>>>> here (
>>>>>>>>>>>>>>>>>>>
https://reviews.llvm.org/D93376), deciding what is a
>>>>>>>>>>>> (syntactically
>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>
semantically) valid use of these intrinsics.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
I'd like to gather more context about the intrinsics.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1.
Is there a frontend or framework that introduces
>>>>>>>>>>>>>>>>>>>
lifetime call
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>
non-stack allocated objects?
>>>>>>>>>>>>>>>>>>> If
exists, which behavior do they expect from it?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2.
Can a block's lifetime be started bytewise or
>>>>>>>>>>>>>>>>>>>
elementwise?
>>>>>>>>>>>>>>>>>>> I
imagine an optimization that allow using stack
>>>>>>>>>>>>>>>>>>>
very compactly,
>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> wonder
>>>>>>>>>>>>>>>>>>>
there is a real-world use case.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>
Juneyoung
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>
LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>>
llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

llvm dev - Jan 2021 - lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end

[llvm-dev] lifetime.start/end