thr3ads.net - llvm dev - [LLVMdev] Request for comments: TBAA on call [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Manman Ren

2013-Oct-08 19:45 UTC

[LLVMdev] Request for comments: TBAA on call

TBAA MDNodes should describe the type hierarchy (with struct-path aware
TBAA, it is a type DAG).
It sounds okay to me to add !tbaa.call to function calls to describe which
fields of the type system the call is touching.
One example can be !tbaa.call {read a list of tag nodes, write a list of
tag nodes}.

Thanks,
Manman


On Mon, Oct 7, 2013 at 11:49 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>
>
> On Mon, Oct 7, 2013 at 7:34 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
>
>> I think we’re talking cross-purposes at this point.  Let me try to step
>> back and concisely restate what I’m trying to do:
>>
>> - I am writing a frontend for a high-level language and I emit LLVM IR.
>> - I control the TBAA tree.  I don’t use it to express the high-level
>> language’s types, since that would make absolutely no sense. 
JavaScript
>> doesn’t have types.  But by the time my compiler is generating LLVM IR,
it
>> will have come up with meaningful constraints on aliasing and those
>> constraints are naturally expressed using TBAA.
>> - Some calls that I emit in LLVM IR are neither readnone/readonly nor
do
>> they clobber the world - instead they clobber a known set of memory
>> locations, and that set of memory locations is already expressible in
the
>> TBAA representation that I control.
>>
>> Which part of the above do you disagree with?
>>
>>
> Nothing.
> I disagree that just because you control the TBAA tree generation, that
> you should shoehorn something that is neither what the langref claims it
> should be, or what other producers produce, just because the datastructure
> may kinda-sorta work for your purpose.
>
>
>> On Oct 7, 2013, at 4:57 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>> On Mon, Oct 7, 2013 at 4:22 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
>>
>>
>>
>> On Oct 7, 2013, at 3:49 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>>
>>
>>
>> On Wed, Sep 25, 2013 at 5:50 PM, Filip Pizlo <fpizlo at
apple.com> wrote:
>>
>>
>> Hi all,
>>
>> TBAA on loads and stores is super useful for conveying high-level type
>> knowledge to LLVM transformations.  But often translating a high-level
>> language into LLVM IR will result in runtime calls that are known at
the
>> high level to only affect (either by reading or by writing) some
restricted
>> subset of the heap.  These aren't read-only calls; that is often
too strong
>> of a guarantee.  A great example would be a GC barrier or GC allocation
>> that may arise if LLVM is used as a backend for a compiler for a
>> higher-level language.  There are probably other examples that arise in
>> high-level languages, but let's use allocation as an example for
now.  I
>> can envision two ways of representing allocation in LLVM IR:
>>
>> Method #1:
>>
>>            %object = call @runtime_allocate(... /* pass size and/or
other
>> data */)
>>
>> Method #2:
>>
>>        SomeBlock:
>>            ... ; code preceding the allocation
>>            %objectFast = ... ; IR for the inlined allocation fast path,
>> which produces %objectFast but may yield null if the fast path fails
and we
>> need to take slow path
>>            %didFail = icmp eq %objectFast, null
>>            br %didFail, label %AllocationSlowPath, label &Continue
>>        AllocationSlowPath:
>>            %objectSlow = call @runtime_allocate_slow_path(... /* pass
>> size and/or other data */)
>>            br label %Continue
>>        Continue:
>>            %object = phi [%objectFast, SomeBlock], [%objectSlow,
>> AllocationSlowPath]
>>
>> In both of these cases, the call instruction will appear to clobber the
>> world leading to more pessimistic optimization.  Moreover, it's not
always
>> correct to mark the allocation as being readonly; it may read or write
>> observable state.  But if you don't like the allocation example
then you
>> can imagine that a GC store barrier would have an almost identical
pattern
>> to #2 and it would definitely not be readonly.  Anyway, such a pattern
>> would lead to fewer loads being eliminated or hoisted, fewer stores
being
>> eliminated or sunk, etc - all due to those pesky calls.  But the
high-level
>> code generator that produces this IR will know for sure that
>> @runtime_allocate or @runtime_allocate_slow_path can only affect a very
>> narrow subset of the heap: namely, it will only read and write the
GC's
>> data structures.  So most of the loads and stores surrounding the
>> allocation, that arose from source-level loads and stores, would
definitely
>> not have any interference with the call and this would !
>> be easy to convey using TBAA.
>>
>> Hence if we could decorate the calls to these runtime functions as only
>> reading or writing certain TBAA types,
>>
>>
>>
>> Out of curiosity, why is this tied to TBAA types, rather than variables
>> or something else?
>> IE what analysis are you performing that gives you type answers instead
>> of answers in terms of local/global variables?
>>
>>
>> TBAA’s utility for non-C languages (like JavaScript, in my case) isn’t
>> for reasoning about user-facing types, but for creating what you could
>> alternatively think of as an “abstract heaps”.  TBAA is perfect for
this.
>>
>>
>>
>> No. A nested hierarchy that describes conflicts among possible
>> abstract heap locations is perfect for this. TBAA is not quite the
>> same. See below.
>>
>>
>> For example I might know that a certain function clobbers a field
called
>> ‘foo’.  In my frontend I create an "abstract heap" for each
field name, and
>> this gets lowered to LLVM TBAA.
>>
>>
>>
>> So let's start simple.
>>
>> You are representing an abstract set of heap locations. Are they
>> actually in any way related to the *existing type tree* that is being
>> created for TBAA?
>>
>>
>> Yes.  Because I am also creating the TBAA type tree.  I control it
>> entirely.
>>
>
> So then the answer is really "no", in the sense that what you
want to
> produce  is unlike any of the current producers or what is actually
> described :)
>
>
>>
>> Or are you adding new values to it that are not
>> really part of a type hierarchy at all, but represent some other
>> relation among abstract heaps?
>> From your description (you create new things that represent abstract
>> heap locations and lower that into a nested tree), it sounds like the
>> answer is "you are adding new locations that represent a different
>> tree than the existing TBAA”.
>>
>>
>> What do you mean by “the existing TBAA”?  To my understanding, LLVM
>> itself never creates TBAA nodes; they arise entirely out of the
frontend
>> that emits LLVM IR.  I control the frontend in this case.
>>
>>
>> If that is not true, what is the actual relation to the existing
>> information?
>> TBAA in the LLVM compiler, at least as currently produced, while
>> theoretically compatible with this scheme , is not quite the same.
>> The langref says "metadata is added to the IR to describe a type
>> system of a higher level language.”
>>
>>
>> My understanding is that LLVM doesn’t produce TBAA.  It consumes TBAA.
>>
> True.
>
>
>>
>> Hence it’s more meaningful to reason about TBAA in terms of its
semantics
>> rather than hypothesizing about how and why someone would produce it.
>>
>
> That would be great, but it's not what the langref says, nor does it
match
> up with the name of the thing you are creating, nor does it necessarily
> have optimal semantics, nor would it be great for future producers or the
> ability to do other analyses in those producers.
>
>
>>  We agree that TBAA is a nested set structure that describes heap
>> locations.
>>
>
> Though note that it is bidirectional which is actually probably
> non-optimal for describing heap locations that are unrelated to types.
>
>
>
>>   I’m observing that it would be useful to express bounds on what heap
>> locations that calls read or write.  Therefore, TBAA can be used to
convey
>> those sets.
>>
>
> I do not disagree that it can, i disagree that it *should*.
>
>
>>
>>
>> If you are adding a new tree just to represent new abstract heap
>> location info, you are essentially reusing it simply because it
>> represents a nested set structure.   I don't see a lot of value in
>> that sort of conflation, hence the question.
>>
>>
>> If we agree that it would be good to have a system for bounding what a
>> call reads or writes using nested sets, then we have two choices:
>>
>> - Reuse TBAA, which can already be used to express nested sets of
memory
>> locations.
>>
> Depending upon the properties of those sets, yes, I agree.
>
>
>>
>> - Invent a whole new type system that is structurally identical to TBAA
>> but that goes by a different name.
>>
>
> Why would it necessarily be structurally identical?
> TBAA is a good approximation for your particular analysis, but it's
nested
> set representation will not necessarily be optimal in the end for what else
> people produce for call based info
> By shoving one into the other, you force this representation on everyone,
> essentially until someone else is going to come along and untangle them
> again.
> You also force all producers that want to represent the output of some
> kind of analyses around calls, *and* actual type based alias info, to
> combine both into a single tree.
>
>
>> Are you seriously proposing the latter?
>>
>> Yes, I am seriously proposing that you do not shoehorn something
> completely different than what the langref describes, and the code expects,
> into an existing structure.
>
> If you want to change the langref, all the tests, the name,  etc, first,
>  I would object, but that would certainly be less ugly to do that in
> addition to your proposal.
>
>>
>> If not, can you describe in more detail how you are generating this
info?
>> Because that is what i get from your description.
>>
>> The information you are talking about here is traditional mod/ref info
>> that is completely independent of TBAA.
>>
>>
>> You lost me.  I’m writing a compiler that generates LLVM IR.  I have
>> high-level knowledge that allows me to prove the set of abstract heap
>> locations that may be read or written by any call, load, or store.
>>
>> What do you suggest as a mechanism by which I can communicate that
>> information to LLVM?
>>
>>
>> I'm suggesting you create an attribute that deals with abstract
heap
>> locations that you are describing, instead of overloading and tying it
>> into an existing set of information that  has a different purpose
>> (TBAA, which at least in the current world, represents a nested type
>> tree that describes access rules of a language).
>>
>>
>> So, you’re suggesting inventing a new representation for type
hierarchies,
>>
>
> I assume you mean abstract heap locations.
>
>
>>  that would have the following properties:
>>
>> - The same rules as TBAA for load/store.
>>
>
> For starters, yes.
> Note that the bidirectional (both up and down the tree) check does TBAA is
> not necessarily needed, depending on how you formulate it.
>
>>
>> - The same rules as what I propose for call.
>>
>> - Identical to TBAA in all other respects, except that it would have a
>> different name.
>>
>> Am I understanding your proposal correctly?
>>
>
> You are understanding where I suggest you start, yes.  I expect it will
> slowly grow to be quite different than the existing TBAA tree.
>
>
>>
>>
>> Are you saying that such a mechanism already exists and that I should
use
>> that instead?
>>
>> As far as I can tell, TBAA is the cleanest such mechanism that I have
>> found thus far and its only shortcoming is that I cannot use it to
annotate
>> calls.
>>
>>
>> The fact that some other mechanism is kinda-sorta-like-what-you-want
>> does not necessarily mean it should be tied into it :)
>>
>> Hence, I am trying to understand the actual relation between the TBAA
>> tree as it exists now
>>
>>
>> What TBAA tree?
>>
> The one produced by the only current producers.
> The one whose purpose is described by the langref, and even the comments
> in various places
>
>
>>   Are you referring to the TBAA format as described in
>> http://llvm.org/docs/LangRef.html#tbaa-metadata, or some specific TBAA
>> tree that exists for some specific frontend for some specific language?
>>
>
> Let's say "both".
> Since the TBAA tree produced by other producers all fall into the category
> of "actual high level type hierarchy info", as the langref
describes.
>
>>
>>  , and what you are suggesting creating.
>> For example, i could perform pointer analysis in clang and generate a
>> TBAA tree from that. It would work.  It would be easy.  It doesn't
>> mean it is necessarily a great approach.
>>
>>
>> Would your pointer analysis generate either disjoint sets, or nested
>> sets?
>>
>
> This is entirely possible, for example, all the andersens's will end up
> directional.
> All of context-sensitive will end up another way.
> Is your suggestion that they then drop all this info because it can't
be
> shoehorned into the existing TBAA format?
> And that someone else later disentangle this from the actual *type based
> info* being produced by other frontends?
> What if someone wants to produce both kinds of info?
> Right now your description is essentially the output of an analysis that
> can be run just as well in clang on C/C++
>
> For producers that *also* want to convey type based alias info and
> abstract heap location info for calls,  it now requires combining multiple
> analysis outputs into a single tree called, confusingly, a "TBAA
tree".
>
> Can you explain what, other than saving you the same amount of work that
> was recently done for struct-tbaa,  your proposal buys?
>
>
>
>
>
>>  For example, if you did a Steensgaard-style pointer analysis then TBAA
>> *would* be the most ideal way of representing the results of such an
>> analysis.
>>
>
> Actually, no.
> For a consumer, if it's static, the ideal representation for a *TBAA
tree*
> would be a simple list of nodes and dfs in/out numbers attached to those
> nodes, produced from walking the tree as it exists in your frontend.  There
> is no reason to explicitly represent the actual child/parent relationships
> among the nodes.
>
> You can then check ancestry by doing what we do elsewhere (for example,
> dominates(a, b) checking), and use the DFS numbers to check ancestry.
>  There is no need to actually represent the tree, or walk it, as LLVM does
> now.  This method will give you constant time and less space usage than
> what you have now.
>
> If the TBAA tree does not really represent anything in particular, giving
> it explicit structure when it's structure means nothing is entirely
> pointless.
>
> In the case of steensgaard in particular, if you are just asking
> aliases(x, y) questions, you actually don't need the points-to
> relationships as represented by ECR's, only something much simpler -
> whether two things are in the same disjoint set.  You don't need the
union
> find structure to do this once you've finished the algorithm, you could
> just number the equivalence classes and go from there.
>
> Of course, more complex queries about the relationships between the
ECR's
> themselves (IE the points-to graph), you'd need parent-child
relationships.
>
> But this is all besides the point, except to point out it's not
actually
> optimal for what you are suggesting representing :)
>
>
>>
>
>
>> One way to view my front end’s aliasing knowledge is that I’m doing
>> something like a unification-based analysis.  I actually end up with a
>> three-level TBAA hierarchy (the world, with subtypes for my disjoint
sets
>> of heap locations; some of those sets then have further subtypes).
>>
>
> Yes.
>
>>
>>
>>
>>
>> The more specific the info you can get from the metadata, the better
off
>> you are eliminating loads/stores
>>
>>
>> Otherwise, yes, it's certainly incredibly helpful to the other
>> optimizations.
>>
>> However, unless there is a good reason, i'd suggest you just make
it
>> generic call read/write metadata that talks about values (be they
incoming
>> arguments, alloca'd memory, or whatever else).
>>
>>
>> You lost me.  How does this help a high-level compiler convey to LLVM
>> that a particular call and a particular load or store don’t have any
>> dependency?
>>
>>
>>
>> You add the modref info to the calls. You add attributes to the the
>> loads/stores.  It becomes another disambiguation method to tell
>> whether the load/store and the call have a dependency.
>>
>>
>> Are you suggesting that I override AliasAnalysis?
>>
>> I’m trying to make it possible for a frontend to convey this
information
>> using LLVM IR, rather than having to override internal LLVM classes in
>> order to provide this information.
>>
>
> Can you  explain what the benefit of doing so is, past  some amount of
> work ( that was even done recently in a similar contextwithout much fanfare
> in a not-horrible amount-of-time)?
> I think i've described the downsides adequately:
>
> It's not an ideal representation depending on the analysis
> For producers that *also* want to convey type based alias info in addition
> to this type of info, it requires combining multiple analysis outputs into
> a single tree. This gets messy, quickly.
> It overloads something with a pretty well-defined meaning right now.
> It's not actually optimal as a representation for the output of most
kinds
> of analysis
> It confuses a certain type of mod/ref information with actual type-based
> aliasing info.
>
>
>
>
>> Hence there is a question of what format to use.  I’m arguing that TBAA
>> is that format.
>>
>
>
>>
>>
>>
>> My proposal achieves this, albeit using a coarse-grained at a
>> coarse-grained level: you have to express your knowledge of
dependencies
>> using a hierarchy of abstract sets of heap locations (i.e. exactly what
>> TBAA gives you).
>>
>>
>>
>> TBAA does not actually give you that. It gives you a tree that
>> describes a type hierarchy according to the memory access rules of
>> some language.
>>
>>
>> I disagree.
>>
>> It’s true that this is one use-case for TBAA, but that’s not how I use
>> it.  I don’t believe that I’m under any obligation to use it that way.
>>
>
> Then feel free to update langref, the tests, all the comments, and the
> name.
> I would strongly suggest you not choose this path. It sucks for the future
> for all the reasons I mentioned.
>
>
>
>> TBAA is just a format that LLVM consumes, that can be used by a
producer
>> of LLVM IR to convey aliasing information.
>>
>
> Again, I will repeat: Just because something can be done, doesn't mean
it
> should.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131008/5b3f1379/attachment.html>

Daniel Berlin

2013-Oct-08 19:48 UTC

head link

[LLVMdev] Request for comments: TBAA on call

On Tue, Oct 8, 2013 at 12:45 PM, Manman Ren <manman.ren at gmail.com>
wrote:
>
> TBAA MDNodes should describe the type hierarchy (with struct-path aware
> TBAA, it is a type DAG).
> It sounds okay to me to add !tbaa.call to function calls to describe which
> fields of the type system the call is touching.
>
I agree wholeheartedly :).  However, that's not what is being suggested.
He's suggesting using the tbaa tree to represent the output of an analysis,
not  a type hierarchy, because it's convenient and can represent the output
okay for his language.
My suggestion was that if he wants to do that, to add a different attribute
that works much the same way.

One example can be !tbaa.call {read a list of tag nodes, write a list
of> tag nodes}.
>
> Thanks,
> Manman
>
>
> On Mon, Oct 7, 2013 at 11:49 PM, Daniel Berlin <dberlin at
dberlin.org>wrote:
>
>>
>>
>>
>> On Mon, Oct 7, 2013 at 7:34 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
>>
>>> I think we’re talking cross-purposes at this point.  Let me try to
step
>>> back and concisely restate what I’m trying to do:
>>>
>>> - I am writing a frontend for a high-level language and I emit LLVM
IR.
>>> - I control the TBAA tree.  I don’t use it to express the
high-level
>>> language’s types, since that would make absolutely no sense. 
JavaScript
>>> doesn’t have types.  But by the time my compiler is generating LLVM
IR, it
>>> will have come up with meaningful constraints on aliasing and those
>>> constraints are naturally expressed using TBAA.
>>> - Some calls that I emit in LLVM IR are neither readnone/readonly
nor do
>>> they clobber the world - instead they clobber a known set of memory
>>> locations, and that set of memory locations is already expressible
in the
>>> TBAA representation that I control.
>>>
>>> Which part of the above do you disagree with?
>>>
>>>
>> Nothing.
>> I disagree that just because you control the TBAA tree generation, that
>> you should shoehorn something that is neither what the langref claims
it
>> should be, or what other producers produce, just because the
datastructure
>> may kinda-sorta work for your purpose.
>>
>>
>>>  On Oct 7, 2013, at 4:57 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>
>>> On Mon, Oct 7, 2013 at 4:22 PM, Filip Pizlo <fpizlo at
apple.com> wrote:
>>>
>>>
>>>
>>> On Oct 7, 2013, at 3:49 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>
>>>
>>>
>>>
>>> On Wed, Sep 25, 2013 at 5:50 PM, Filip Pizlo <fpizlo at
apple.com> wrote:
>>>
>>>
>>> Hi all,
>>>
>>> TBAA on loads and stores is super useful for conveying high-level
type
>>> knowledge to LLVM transformations.  But often translating a
high-level
>>> language into LLVM IR will result in runtime calls that are known
at the
>>> high level to only affect (either by reading or by writing) some
restricted
>>> subset of the heap.  These aren't read-only calls; that is
often too strong
>>> of a guarantee.  A great example would be a GC barrier or GC
allocation
>>> that may arise if LLVM is used as a backend for a compiler for a
>>> higher-level language.  There are probably other examples that
arise in
>>> high-level languages, but let's use allocation as an example
for now.  I
>>> can envision two ways of representing allocation in LLVM IR:
>>>
>>> Method #1:
>>>
>>>            %object = call @runtime_allocate(... /* pass size and/or
>>> other data */)
>>>
>>> Method #2:
>>>
>>>        SomeBlock:
>>>            ... ; code preceding the allocation
>>>            %objectFast = ... ; IR for the inlined allocation fast
path,
>>> which produces %objectFast but may yield null if the fast path
fails and we
>>> need to take slow path
>>>            %didFail = icmp eq %objectFast, null
>>>            br %didFail, label %AllocationSlowPath, label
&Continue
>>>        AllocationSlowPath:
>>>            %objectSlow = call @runtime_allocate_slow_path(... /*
pass
>>> size and/or other data */)
>>>            br label %Continue
>>>        Continue:
>>>            %object = phi [%objectFast, SomeBlock], [%objectSlow,
>>> AllocationSlowPath]
>>>
>>> In both of these cases, the call instruction will appear to clobber
the
>>> world leading to more pessimistic optimization.  Moreover, it's
not always
>>> correct to mark the allocation as being readonly; it may read or
write
>>> observable state.  But if you don't like the allocation example
then you
>>> can imagine that a GC store barrier would have an almost identical
pattern
>>> to #2 and it would definitely not be readonly.  Anyway, such a
pattern
>>> would lead to fewer loads being eliminated or hoisted, fewer stores
being
>>> eliminated or sunk, etc - all due to those pesky calls.  But the
high-level
>>> code generator that produces this IR will know for sure that
>>> @runtime_allocate or @runtime_allocate_slow_path can only affect a
very
>>> narrow subset of the heap: namely, it will only read and write the
GC's
>>> data structures.  So most of the loads and stores surrounding the
>>> allocation, that arose from source-level loads and stores, would
definitely
>>> not have any interference with the call and this would !
>>> be easy to convey using TBAA.
>>>
>>> Hence if we could decorate the calls to these runtime functions as
only
>>> reading or writing certain TBAA types,
>>>
>>>
>>>
>>> Out of curiosity, why is this tied to TBAA types, rather than
variables
>>> or something else?
>>> IE what analysis are you performing that gives you type answers
instead
>>> of answers in terms of local/global variables?
>>>
>>>
>>> TBAA’s utility for non-C languages (like JavaScript, in my case)
isn’t
>>> for reasoning about user-facing types, but for creating what you
could
>>> alternatively think of as an “abstract heaps”.  TBAA is perfect for
this.
>>>
>>>
>>>
>>> No. A nested hierarchy that describes conflicts among possible
>>> abstract heap locations is perfect for this. TBAA is not quite the
>>> same. See below.
>>>
>>>
>>> For example I might know that a certain function clobbers a field
called
>>> ‘foo’.  In my frontend I create an "abstract heap" for
each field name, and
>>> this gets lowered to LLVM TBAA.
>>>
>>>
>>>
>>> So let's start simple.
>>>
>>> You are representing an abstract set of heap locations. Are they
>>> actually in any way related to the *existing type tree* that is
being
>>> created for TBAA?
>>>
>>>
>>> Yes.  Because I am also creating the TBAA type tree.  I control it
>>> entirely.
>>>
>>
>> So then the answer is really "no", in the sense that what you
want to
>> produce  is unlike any of the current producers or what is actually
>> described :)
>>
>>
>>>
>>> Or are you adding new values to it that are not
>>> really part of a type hierarchy at all, but represent some other
>>> relation among abstract heaps?
>>> From your description (you create new things that represent
abstract
>>> heap locations and lower that into a nested tree), it sounds like
the
>>> answer is "you are adding new locations that represent a
different
>>> tree than the existing TBAA”.
>>>
>>>
>>> What do you mean by “the existing TBAA”?  To my understanding, LLVM
>>> itself never creates TBAA nodes; they arise entirely out of the
frontend
>>> that emits LLVM IR.  I control the frontend in this case.
>>>
>>>
>>> If that is not true, what is the actual relation to the existing
>>> information?
>>> TBAA in the LLVM compiler, at least as currently produced, while
>>> theoretically compatible with this scheme , is not quite the same.
>>> The langref says "metadata is added to the IR to describe a
type
>>> system of a higher level language.”
>>>
>>>
>>> My understanding is that LLVM doesn’t produce TBAA.  It consumes
TBAA.
>>>
>> True.
>>
>>
>>>
>>> Hence it’s more meaningful to reason about TBAA in terms of its
>>> semantics rather than hypothesizing about how and why someone would
produce
>>> it.
>>>
>>
>> That would be great, but it's not what the langref says, nor does
it
>> match up with the name of the thing you are creating, nor does it
>> necessarily have optimal semantics, nor would it be great for future
>> producers or the ability to do other analyses in those producers.
>>
>>
>>>  We agree that TBAA is a nested set structure that describes heap
>>> locations.
>>>
>>
>> Though note that it is bidirectional which is actually probably
>> non-optimal for describing heap locations that are unrelated to types.
>>
>>
>>
>>>   I’m observing that it would be useful to express bounds on what
heap
>>> locations that calls read or write.  Therefore, TBAA can be used to
convey
>>> those sets.
>>>
>>
>> I do not disagree that it can, i disagree that it *should*.
>>
>>
>>>
>>>
>>> If you are adding a new tree just to represent new abstract heap
>>> location info, you are essentially reusing it simply because it
>>> represents a nested set structure.   I don't see a lot of value
in
>>> that sort of conflation, hence the question.
>>>
>>>
>>> If we agree that it would be good to have a system for bounding
what a
>>> call reads or writes using nested sets, then we have two choices:
>>>
>>> - Reuse TBAA, which can already be used to express nested sets of
memory
>>> locations.
>>>
>> Depending upon the properties of those sets, yes, I agree.
>>
>>
>>>
>>> - Invent a whole new type system that is structurally identical to
TBAA
>>> but that goes by a different name.
>>>
>>
>> Why would it necessarily be structurally identical?
>> TBAA is a good approximation for your particular analysis, but it's
>> nested set representation will not necessarily be optimal in the end
for
>> what else people produce for call based info
>> By shoving one into the other, you force this representation on
everyone,
>> essentially until someone else is going to come along and untangle them
>> again.
>> You also force all producers that want to represent the output of some
>> kind of analyses around calls, *and* actual type based alias info, to
>> combine both into a single tree.
>>
>>
>>> Are you seriously proposing the latter?
>>>
>>> Yes, I am seriously proposing that you do not shoehorn something
>> completely different than what the langref describes, and the code
expects,
>> into an existing structure.
>>
>> If you want to change the langref, all the tests, the name,  etc,
first,
>>  I would object, but that would certainly be less ugly to do that in
>> addition to your proposal.
>>
>>>
>>> If not, can you describe in more detail how you are generating this
info?
>>> Because that is what i get from your description.
>>>
>>> The information you are talking about here is traditional mod/ref
info
>>> that is completely independent of TBAA.
>>>
>>>
>>> You lost me.  I’m writing a compiler that generates LLVM IR.  I
have
>>> high-level knowledge that allows me to prove the set of abstract
heap
>>> locations that may be read or written by any call, load, or store.
>>>
>>> What do you suggest as a mechanism by which I can communicate that
>>> information to LLVM?
>>>
>>>
>>> I'm suggesting you create an attribute that deals with abstract
heap
>>> locations that you are describing, instead of overloading and tying
it
>>> into an existing set of information that  has a different purpose
>>> (TBAA, which at least in the current world, represents a nested
type
>>> tree that describes access rules of a language).
>>>
>>>
>>> So, you’re suggesting inventing a new representation for type
>>> hierarchies,
>>>
>>
>> I assume you mean abstract heap locations.
>>
>>
>>>  that would have the following properties:
>>>
>>> - The same rules as TBAA for load/store.
>>>
>>
>> For starters, yes.
>> Note that the bidirectional (both up and down the tree) check does TBAA
>> is not necessarily needed, depending on how you formulate it.
>>
>>>
>>> - The same rules as what I propose for call.
>>>
>>> - Identical to TBAA in all other respects, except that it would
have a
>>> different name.
>>>
>>> Am I understanding your proposal correctly?
>>>
>>
>> You are understanding where I suggest you start, yes.  I expect it will
>> slowly grow to be quite different than the existing TBAA tree.
>>
>>
>>>
>>>
>>> Are you saying that such a mechanism already exists and that I
should
>>> use that instead?
>>>
>>> As far as I can tell, TBAA is the cleanest such mechanism that I
have
>>> found thus far and its only shortcoming is that I cannot use it to
annotate
>>> calls.
>>>
>>>
>>> The fact that some other mechanism is
kinda-sorta-like-what-you-want
>>> does not necessarily mean it should be tied into it :)
>>>
>>> Hence, I am trying to understand the actual relation between the
TBAA
>>> tree as it exists now
>>>
>>>
>>> What TBAA tree?
>>>
>> The one produced by the only current producers.
>> The one whose purpose is described by the langref, and even the
comments
>> in various places
>>
>>
>>>   Are you referring to the TBAA format as described in
>>> http://llvm.org/docs/LangRef.html#tbaa-metadata, or some specific
TBAA
>>> tree that exists for some specific frontend for some specific
language?
>>>
>>
>> Let's say "both".
>> Since the TBAA tree produced by other producers all fall into the
>> category of "actual high level type hierarchy info", as the
langref
>> describes.
>>
>>>
>>>  , and what you are suggesting creating.
>>> For example, i could perform pointer analysis in clang and generate
a
>>> TBAA tree from that. It would work.  It would be easy.  It
doesn't
>>> mean it is necessarily a great approach.
>>>
>>>
>>> Would your pointer analysis generate either disjoint sets, or
nested
>>> sets?
>>>
>>
>> This is entirely possible, for example, all the andersens's will
end up
>> directional.
>> All of context-sensitive will end up another way.
>> Is your suggestion that they then drop all this info because it
can't be
>> shoehorned into the existing TBAA format?
>> And that someone else later disentangle this from the actual *type
based
>> info* being produced by other frontends?
>> What if someone wants to produce both kinds of info?
>> Right now your description is essentially the output of an analysis
that
>> can be run just as well in clang on C/C++
>>
>> For producers that *also* want to convey type based alias info and
>> abstract heap location info for calls,  it now requires combining
multiple
>> analysis outputs into a single tree called, confusingly, a "TBAA
tree".
>>
>> Can you explain what, other than saving you the same amount of work
that
>> was recently done for struct-tbaa,  your proposal buys?
>>
>>
>>
>>
>>
>>>  For example, if you did a Steensgaard-style pointer analysis then
TBAA
>>> *would* be the most ideal way of representing the results of such
an
>>> analysis.
>>>
>>
>> Actually, no.
>> For a consumer, if it's static, the ideal representation for a
*TBAA
>> tree* would be a simple list of nodes and dfs in/out numbers attached
to
>> those nodes, produced from walking the tree as it exists in your
frontend.
>>  There is no reason to explicitly represent the actual child/parent
>> relationships among the nodes.
>>
>> You can then check ancestry by doing what we do elsewhere (for example,
>> dominates(a, b) checking), and use the DFS numbers to check ancestry.
>>  There is no need to actually represent the tree, or walk it, as LLVM
does
>> now.  This method will give you constant time and less space usage than
>> what you have now.
>>
>> If the TBAA tree does not really represent anything in particular,
giving
>> it explicit structure when it's structure means nothing is entirely
>> pointless.
>>
>> In the case of steensgaard in particular, if you are just asking
>> aliases(x, y) questions, you actually don't need the points-to
>> relationships as represented by ECR's, only something much simpler
-
>> whether two things are in the same disjoint set.  You don't need
the union
>> find structure to do this once you've finished the algorithm, you
could
>> just number the equivalence classes and go from there.
>>
>> Of course, more complex queries about the relationships between the
ECR's
>> themselves (IE the points-to graph), you'd need parent-child
relationships.
>>
>> But this is all besides the point, except to point out it's not
actually
>> optimal for what you are suggesting representing :)
>>
>>
>>>
>>
>>
>>> One way to view my front end’s aliasing knowledge is that I’m doing
>>> something like a unification-based analysis.  I actually end up
with a
>>> three-level TBAA hierarchy (the world, with subtypes for my
disjoint sets
>>> of heap locations; some of those sets then have further subtypes).
>>>
>>
>> Yes.
>>
>>>
>>>
>>>
>>>
>>> The more specific the info you can get from the metadata, the
better off
>>> you are eliminating loads/stores
>>>
>>>
>>> Otherwise, yes, it's certainly incredibly helpful to the other
>>> optimizations.
>>>
>>> However, unless there is a good reason, i'd suggest you just
make it
>>> generic call read/write metadata that talks about values (be they
incoming
>>> arguments, alloca'd memory, or whatever else).
>>>
>>>
>>> You lost me.  How does this help a high-level compiler convey to
LLVM
>>> that a particular call and a particular load or store don’t have
any
>>> dependency?
>>>
>>>
>>>
>>> You add the modref info to the calls. You add attributes to the the
>>> loads/stores.  It becomes another disambiguation method to tell
>>> whether the load/store and the call have a dependency.
>>>
>>>
>>> Are you suggesting that I override AliasAnalysis?
>>>
>>> I’m trying to make it possible for a frontend to convey this
information
>>> using LLVM IR, rather than having to override internal LLVM classes
in
>>> order to provide this information.
>>>
>>
>> Can you  explain what the benefit of doing so is, past  some amount of
>> work ( that was even done recently in a similar contextwithout much
fanfare
>> in a not-horrible amount-of-time)?
>> I think i've described the downsides adequately:
>>
>> It's not an ideal representation depending on the analysis
>> For producers that *also* want to convey type based alias info in
>> addition to this type of info, it requires combining multiple analysis
>> outputs into a single tree. This gets messy, quickly.
>> It overloads something with a pretty well-defined meaning right now.
>> It's not actually optimal as a representation for the output of
most
>> kinds of analysis
>> It confuses a certain type of mod/ref information with actual
type-based
>> aliasing info.
>>
>>
>>
>>
>>> Hence there is a question of what format to use.  I’m arguing that
TBAA
>>> is that format.
>>>
>>
>>
>>>
>>>
>>>
>>> My proposal achieves this, albeit using a coarse-grained at a
>>> coarse-grained level: you have to express your knowledge of
dependencies
>>> using a hierarchy of abstract sets of heap locations (i.e. exactly
what
>>> TBAA gives you).
>>>
>>>
>>>
>>> TBAA does not actually give you that. It gives you a tree that
>>> describes a type hierarchy according to the memory access rules of
>>> some language.
>>>
>>>
>>> I disagree.
>>>
>>> It’s true that this is one use-case for TBAA, but that’s not how I
use
>>> it.  I don’t believe that I’m under any obligation to use it that
way.
>>>
>>
>> Then feel free to update langref, the tests, all the comments, and the
>> name.
>> I would strongly suggest you not choose this path. It sucks for the
>> future for all the reasons I mentioned.
>>
>>
>>
>>> TBAA is just a format that LLVM consumes, that can be used by a
producer
>>> of LLVM IR to convey aliasing information.
>>>
>>
>> Again, I will repeat: Just because something can be done, doesn't
mean it
>> should.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131008/5d4f0755/attachment.html>

Filip Pizlo

2013-Oct-10 16:08 UTC

head link

[LLVMdev] Request for comments: TBAA on call

On Oct 8, 2013, at 12:48 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> 
> 
> 
> On Tue, Oct 8, 2013 at 12:45 PM, Manman Ren <manman.ren at gmail.com>
wrote:
> 
> TBAA MDNodes should describe the type hierarchy (with struct-path aware
TBAA, it is a type DAG).
> It sounds okay to me to add !tbaa.call to function calls to describe which
fields of the type system the call is touching.
> 
> I agree wholeheartedly :).
OK, great.
> However, that's not what is being suggested.
> He's suggesting using the tbaa tree to represent the output of an
analysis, not  a type hierarchy, because it's convenient and can represent
the output okay for his language.
WebKit will continue to use TBAA this way.
> My suggestion was that if he wants to do that, to add a different attribute
that works much the same way.
Attributes should be defined in terms of their semantics in LLVM, not by their
philosophical intent.  If a static analysis produces results whose semantics can
be mapped soundly onto a type system that involves a type hierarchy, then this
is obviously the right thing to do.

-Filip
> 
> One example can be !tbaa.call {read a list of tag nodes, write a list of
tag nodes}.
> 
> Thanks,
> Manman
> 
> 
> On Mon, Oct 7, 2013 at 11:49 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
> 
> 
> 
> On Mon, Oct 7, 2013 at 7:34 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
> I think we’re talking cross-purposes at this point.  Let me try to step
back and concisely restate what I’m trying to do:
> 
> - I am writing a frontend for a high-level language and I emit LLVM IR.
> - I control the TBAA tree.  I don’t use it to express the high-level
language’s types, since that would make absolutely no sense.  JavaScript doesn’t
have types.  But by the time my compiler is generating LLVM IR, it will have
come up with meaningful constraints on aliasing and those constraints are
naturally expressed using TBAA.
> - Some calls that I emit in LLVM IR are neither readnone/readonly nor do
they clobber the world - instead they clobber a known set of memory locations,
and that set of memory locations is already expressible in the TBAA
representation that I control.
> 
> Which part of the above do you disagree with?
> 
> 
> Nothing.
> I disagree that just because you control the TBAA tree generation, that you
should shoehorn something that is neither what the langref claims it should be,
or what other producers produce, just because the datastructure may kinda-sorta
work for your purpose.
>  
> On Oct 7, 2013, at 4:57 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
>> On Mon, Oct 7, 2013 at 4:22 PM, Filip Pizlo <fpizlo at apple.com>
wrote:
>>> 
>>> 
>>> On Oct 7, 2013, at 3:49 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Sep 25, 2013 at 5:50 PM, Filip Pizlo <fpizlo at
apple.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> TBAA on loads and stores is super useful for conveying
high-level type knowledge to LLVM transformations.  But often translating a
high-level language into LLVM IR will result in runtime calls that are known at
the high level to only affect (either by reading or by writing) some restricted
subset of the heap.  These aren't read-only calls; that is often too strong
of a guarantee.  A great example would be a GC barrier or GC allocation that may
arise if LLVM is used as a backend for a compiler for a higher-level language. 
There are probably other examples that arise in high-level languages, but
let's use allocation as an example for now.  I can envision two ways of
representing allocation in LLVM IR:
>>>> 
>>>> Method #1:
>>>> 
>>>>            %object = call @runtime_allocate(... /* pass size
and/or other data */)
>>>> 
>>>> Method #2:
>>>> 
>>>>        SomeBlock:
>>>>            ... ; code preceding the allocation
>>>>            %objectFast = ... ; IR for the inlined allocation
fast path, which produces %objectFast but may yield null if the fast path fails
and we need to take slow path
>>>>            %didFail = icmp eq %objectFast, null
>>>>            br %didFail, label %AllocationSlowPath, label
&Continue
>>>>        AllocationSlowPath:
>>>>            %objectSlow = call @runtime_allocate_slow_path(...
/* pass size and/or other data */)
>>>>            br label %Continue
>>>>        Continue:
>>>>            %object = phi [%objectFast, SomeBlock],
[%objectSlow, AllocationSlowPath]
>>>> 
>>>> In both of these cases, the call instruction will appear to
clobber the world leading to more pessimistic optimization.  Moreover, it's
not always correct to mark the allocation as being readonly; it may read or
write observable state.  But if you don't like the allocation example then
you can imagine that a GC store barrier would have an almost identical pattern
to #2 and it would definitely not be readonly.  Anyway, such a pattern would
lead to fewer loads being eliminated or hoisted, fewer stores being eliminated
or sunk, etc - all due to those pesky calls.  But the high-level code generator
that produces this IR will know for sure that @runtime_allocate or
@runtime_allocate_slow_path can only affect a very narrow subset of the heap:
namely, it will only read and write the GC's data structures.  So most of
the loads and stores surrounding the allocation, that arose from source-level
loads and stores, would definitely not have any interference with the call and
this would !
>>>> be easy to convey using TBAA.
>>>> 
>>>> Hence if we could decorate the calls to these runtime functions
as only reading or writing certain TBAA types,
>>> 
>>> 
>>> Out of curiosity, why is this tied to TBAA types, rather than
variables or something else?
>>> IE what analysis are you performing that gives you type answers
instead of answers in terms of local/global variables?
>>> 
>>> 
>>> TBAA’s utility for non-C languages (like JavaScript, in my case)
isn’t for reasoning about user-facing types, but for creating what you could
alternatively think of as an “abstract heaps”.  TBAA is perfect for this.
>> 
>> 
>> No. A nested hierarchy that describes conflicts among possible
>> abstract heap locations is perfect for this. TBAA is not quite the
>> same. See below.
>> 
>>> 
>>> For example I might know that a certain function clobbers a field
called ‘foo’.  In my frontend I create an "abstract heap" for each
field name, and this gets lowered to LLVM TBAA.
>> 
>> 
>> So let's start simple.
>> 
>> You are representing an abstract set of heap locations. Are they
>> actually in any way related to the *existing type tree* that is being
>> created for TBAA?
> 
> Yes.  Because I am also creating the TBAA type tree.  I control it
entirely.
> 
> So then the answer is really "no", in the sense that what you
want to produce  is unlike any of the current producers or what is actually
described :)
>  
> 
>> Or are you adding new values to it that are not
>> really part of a type hierarchy at all, but represent some other
>> relation among abstract heaps?
>> From your description (you create new things that represent abstract
>> heap locations and lower that into a nested tree), it sounds like the
>> answer is "you are adding new locations that represent a different
>> tree than the existing TBAA”.
> 
> What do you mean by “the existing TBAA”?  To my understanding, LLVM itself
never creates TBAA nodes; they arise entirely out of the frontend that emits
LLVM IR.  I control the frontend in this case.
> 
>> 
>> If that is not true, what is the actual relation to the existing
information?
>> TBAA in the LLVM compiler, at least as currently produced, while
>> theoretically compatible with this scheme , is not quite the same.
>> The langref says "metadata is added to the IR to describe a type
>> system of a higher level language.”
> 
> My understanding is that LLVM doesn’t produce TBAA.  It consumes TBAA.
> True.
>  
> 
> Hence it’s more meaningful to reason about TBAA in terms of its semantics
rather than hypothesizing about how and why someone would produce it.
> 
> That would be great, but it's not what the langref says, nor does it
match up with the name of the thing you are creating, nor does it necessarily
have optimal semantics, nor would it be great for future producers or the
ability to do other analyses in those producers.
>  
>  We agree that TBAA is a nested set structure that describes heap
locations.
> 
> Though note that it is bidirectional which is actually probably non-optimal
for describing heap locations that are unrelated to types.
> 
>  
>  I’m observing that it would be useful to express bounds on what heap
locations that calls read or write.  Therefore, TBAA can be used to convey those
sets.
> 
> I do not disagree that it can, i disagree that it *should*.
>  
> 
>> 
>> If you are adding a new tree just to represent new abstract heap
>> location info, you are essentially reusing it simply because it
>> represents a nested set structure.   I don't see a lot of value in
>> that sort of conflation, hence the question.
> 
> If we agree that it would be good to have a system for bounding what a call
reads or writes using nested sets, then we have two choices:
> 
> - Reuse TBAA, which can already be used to express nested sets of memory
locations.
> Depending upon the properties of those sets, yes, I agree.
>  
> 
> - Invent a whole new type system that is structurally identical to TBAA but
that goes by a different name.
> 
> Why would it necessarily be structurally identical?
> TBAA is a good approximation for your particular analysis, but it's
nested set representation will not necessarily be optimal in the end for what
else people produce for call based info
> By shoving one into the other, you force this representation on everyone,
essentially until someone else is going to come along and untangle them again.
> You also force all producers that want to represent the output of some kind
of analyses around calls, *and* actual type based alias info, to combine both
into a single tree.
> 
> 
> Are you seriously proposing the latter?
> 
> Yes, I am seriously proposing that you do not shoehorn something completely
different than what the langref describes, and the code expects, into an
existing structure.
> 
> If you want to change the langref, all the tests, the name,  etc, first,  I
would object, but that would certainly be less ugly to do that in addition to
your proposal.
>> 
>> If not, can you describe in more detail how you are generating this
info?
>> Because that is what i get from your description.
>> 
>>> The information you are talking about here is traditional mod/ref
info that is completely independent of TBAA.
>>> 
>>> 
>>> You lost me.  I’m writing a compiler that generates LLVM IR.  I
have high-level knowledge that allows me to prove the set of abstract heap
locations that may be read or written by any call, load, or store.
>>> 
>>> What do you suggest as a mechanism by which I can communicate that
information to LLVM?
>> 
>> I'm suggesting you create an attribute that deals with abstract
heap
>> locations that you are describing, instead of overloading and tying it
>> into an existing set of information that  has a different purpose
>> (TBAA, which at least in the current world, represents a nested type
>> tree that describes access rules of a language).
> 
> So, you’re suggesting inventing a new representation for type hierarchies,
> 
> I assume you mean abstract heap locations.
>  
> that would have the following properties:
> 
> - The same rules as TBAA for load/store.
> 
> For starters, yes.
> Note that the bidirectional (both up and down the tree) check does TBAA is
not necessarily needed, depending on how you formulate it.
> 
> - The same rules as what I propose for call.
> 
> - Identical to TBAA in all other respects, except that it would have a
different name.
> 
> Am I understanding your proposal correctly?
>  
> You are understanding where I suggest you start, yes.  I expect it will
slowly grow to be quite different than the existing TBAA tree.
>  
>> 
>>> 
>>> Are you saying that such a mechanism already exists and that I
should use that instead?
>>> 
>>> As far as I can tell, TBAA is the cleanest such mechanism that I
have found thus far and its only shortcoming is that I cannot use it to annotate
calls.
>> 
>> The fact that some other mechanism is kinda-sorta-like-what-you-want
>> does not necessarily mean it should be tied into it :)
>> Hence, I am trying to understand the actual relation between the TBAA
>> tree as it exists now
> 
> What TBAA tree?
> The one produced by the only current producers.
> The one whose purpose is described by the langref, and even the comments in
various places
>  
>  Are you referring to the TBAA format as described in
http://llvm.org/docs/LangRef.html#tbaa-metadata, or some specific TBAA tree that
exists for some specific frontend for some specific language?
> 
> Let's say "both".
> Since the TBAA tree produced by other producers all fall into the category
of "actual high level type hierarchy info", as the langref describes.
> 
>> , and what you are suggesting creating.
>> For example, i could perform pointer analysis in clang and generate a
>> TBAA tree from that. It would work.  It would be easy.  It doesn't
>> mean it is necessarily a great approach.
> 
> Would your pointer analysis generate either disjoint sets, or nested sets?
> 
> This is entirely possible, for example, all the andersens's will end up
directional.
> All of context-sensitive will end up another way.
> Is your suggestion that they then drop all this info because it can't
be shoehorned into the existing TBAA format?
> And that someone else later disentangle this from the actual *type based
info* being produced by other frontends?
> What if someone wants to produce both kinds of info?
> Right now your description is essentially the output of an analysis that
can be run just as well in clang on C/C++
> 
> For producers that *also* want to convey type based alias info and abstract
heap location info for calls,  it now requires combining multiple analysis
outputs into a single tree called, confusingly, a "TBAA tree".
> 
> Can you explain what, other than saving you the same amount of work that
was recently done for struct-tbaa,  your proposal buys?
> 
> 
> 
>  
>  For example, if you did a Steensgaard-style pointer analysis then TBAA
*would* be the most ideal way of representing the results of such an analysis.
> 
> Actually, no.
> For a consumer, if it's static, the ideal representation for a *TBAA
tree* would be a simple list of nodes and dfs in/out numbers attached to those
nodes, produced from walking the tree as it exists in your frontend.  There is
no reason to explicitly represent the actual child/parent relationships among
the nodes.
> 
> You can then check ancestry by doing what we do elsewhere (for example,
dominates(a, b) checking), and use the DFS numbers to check ancestry.  There is
no need to actually represent the tree, or walk it, as LLVM does now.  This
method will give you constant time and less space usage than what you have now.
> 
> If the TBAA tree does not really represent anything in particular, giving
it explicit structure when it's structure means nothing is entirely
pointless.
> 
> In the case of steensgaard in particular, if you are just asking aliases(x,
y) questions, you actually don't need the points-to relationships as
represented by ECR's, only something much simpler - whether two things are
in the same disjoint set.  You don't need the union find structure to do
this once you've finished the algorithm, you could just number the
equivalence classes and go from there.
> 
> Of course, more complex queries about the relationships between the
ECR's themselves (IE the points-to graph), you'd need parent-child
relationships.
> 
> But this is all besides the point, except to point out it's not
actually optimal for what you are suggesting representing :)
> 
>  
>  
> One way to view my front end’s aliasing knowledge is that I’m doing
something like a unification-based analysis.  I actually end up with a
three-level TBAA hierarchy (the world, with subtypes for my disjoint sets of
heap locations; some of those sets then have further subtypes).
> 
> Yes.  
> 
>> 
>>> 
>>> 
>>> The more specific the info you can get from the metadata, the
better off you are eliminating loads/stores
>>> 
>>> 
>>> Otherwise, yes, it's certainly incredibly helpful to the other
optimizations.
>>> 
>>> However, unless there is a good reason, i'd suggest you just
make it generic call read/write metadata that talks about values (be they
incoming arguments, alloca'd memory, or whatever else).
>>> 
>>> 
>>> You lost me.  How does this help a high-level compiler convey to
LLVM that a particular call and a particular load or store don’t have any
dependency?
>> 
>> 
>> You add the modref info to the calls. You add attributes to the the
>> loads/stores.  It becomes another disambiguation method to tell
>> whether the load/store and the call have a dependency.
> 
> Are you suggesting that I override AliasAnalysis?
> 
> I’m trying to make it possible for a frontend to convey this information
using LLVM IR, rather than having to override internal LLVM classes in order to
provide this information.
> 
> Can you  explain what the benefit of doing so is, past  some amount of work
( that was even done recently in a similar contextwithout much fanfare in a
not-horrible amount-of-time)?
> I think i've described the downsides adequately:
> 
> It's not an ideal representation depending on the analysis
> For producers that *also* want to convey type based alias info in addition
to this type of info, it requires combining multiple analysis outputs into a
single tree. This gets messy, quickly.
> It overloads something with a pretty well-defined meaning right now.
> It's not actually optimal as a representation for the output of most
kinds of analysis
> It confuses a certain type of mod/ref information with actual type-based
aliasing info.
> 
> 
>   
> Hence there is a question of what format to use.  I’m arguing that TBAA is
that format.
>  
> 
>> 
>>> 
>>> My proposal achieves this, albeit using a coarse-grained at a
coarse-grained level: you have to express your knowledge of dependencies using a
hierarchy of abstract sets of heap locations (i.e. exactly what TBAA gives you).
>> 
>> 
>> TBAA does not actually give you that. It gives you a tree that
>> describes a type hierarchy according to the memory access rules of
>> some language.
> 
> I disagree.
> 
> It’s true that this is one use-case for TBAA, but that’s not how I use it. 
I don’t believe that I’m under any obligation to use it that way.
> 
> Then feel free to update langref, the tests, all the comments, and the
name.
> I would strongly suggest you not choose this path. It sucks for the future
for all the reasons I mentioned.
> 
>  
> TBAA is just a format that LLVM consumes, that can be used by a producer of
LLVM IR to convey aliasing information.
> 
> Again, I will repeat: Just because something can be done, doesn't mean
it should.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/78ab674d/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] Request for comments: TBAA on call

[LLVMdev] Request for comments: TBAA on call

[LLVMdev] Request for comments: TBAA on call

[LLVMdev] Request for comments: TBAA on call

Apparently Analagous Threads