thr3ads.net - llvm dev - [llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Johannes Doerfert via llvm-dev

2018-Aug-23 15:25 UTC

[llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes

After I spend some time working with the function attribute* deduction
pass** [1,3], I would like to propose a "proper" organization***.



Why?

  Because we do not derive nearly as many attributes as we could****,
  while we do maintain various (separate and diffently organized)
  "data-flow-like analyses" to do so.



What else?

  I propose a single optimistic data-flow (fixpoint) loop to perform the
  deduction (similar to [2,3] but for all propagation forms and not only
  call sites -> callee). Given that a pessimistic initialization allows
  to terminate early, we can then also vary the compile time investment.



Where is the catch?

  It will require a substantial rewrite with (some) parts that cannot be
  split in small independent patches. While I believe these to be easy
  to review*****, I want to know if
    (1) there is interest in having better attribute deduction, and
    (2) there are volunteers to review such patches.



I do appreciate any input on this proposal.

Cheers,
  Johannes

--------

* It derives function attributes but also parameter and argument
  attributes.

** lib/Transforms/IPO/FunctionAttrs.cpp

*** This statement is intended to sound harsh. I believe it to be
    appropriate because it is hard to find consistency in the current
    implementation. Different parts perform argument deduction similarly
    but still different. This does lead to code duplication and missed
    deductions as there is no information exchange going on.

****  This includes missing attributes in existing propagations
      e.g., dereferencability (see also [0,1]), missing forms of
      propagation, e.g., call sites -> callee (see [2,3]), as well as
      missing deductions due to the fact that there is no (global)
      fixpoint iteration. Additional examples to showcase some current
      shortcomings are attached to this mail.

***** In the sense that they will "just contain" the implementation of
a
      fixpoint data-flow analysis. The logic to determine/eliminate
      attributes will be similar to the one we have now.


[0] https://reviews.llvm.org/D48387
[1] https://reviews.llvm.org/D50107
[2] https://reviews.llvm.org/D4609
[3] https://reviews.llvm.org/D50125

-- 

Johannes Doerfert
PhD Student / Researcher

Compiler Design Lab (Professor Hack) / Argonne National Laboratory
Saarland Informatics Campus, Germany / Lemont, IL 60439, USA
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de / jdoerfert at
anl.gov
Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert
-------------- next part --------------
; RUN: opt -S -functionattrs -enable-nonnull-arg-prop %s | FileCheck %s
;
; This is an evolved example to stress test SCC parameter attribute propagation.
; The SCC in this test is made up of the following six function, three of which
; are internal and three externally visible:
;
; static int* internal_ret0_nw(int *n0, int *w0);
; static int* internal_ret1_rw(int *r0, int *w0);
; static int* internal_ret1_rrw(int *r0, int *r1, int *w0);
;        int* external_ret2_nrw(int *n0, int *r0, int *w0);
;        int* external_sink_ret2_nrw(int *n0, int *r0, int *w0);
;        int* external_source_ret2_nrw(int *n0, int *r0, int *w0);
;
; The top four functions call each other while the "sink" function
will not
; call anything and the "source" function will not be called in this
module.
; The names of the functions define the returned parameter (X for
"_retX_"),
; as well as how the parameters are (transitively) used (n = readnone,
; r = readonly, w = writeonly).
;
; Currently we get the following statistics:
;   1 functionattrs - Number of functions marked as norecurse
;   6 functionattrs - Number of functions marked as nounwind
;   1 functionattrs - Number of arguments marked nocapture
;   1 functionattrs - Number of arguments marked readnone
;   1 functionattrs - Number of arguments marked readonly
;   1 functionattrs - Number of arguments marked returned
;  --
;  11
;
; This basically describes that we only derived attributes for the
"sink"
; function (except nounwind which is properly derived).
;
;
; What we should get is something along the lines of:
;   1 functionattrs - Number of functions marked as norecurse
;   6 functionattrs - Number of functions marked argmemonly
;   6 functionattrs - Number of functions marked as nounwind
;  16 functionattrs - Number of arguments marked nocapture
;   4 functionattrs - Number of arguments marked readnone
;   6 functionattrs - Number of arguments marked writeonly
;   6 functionattrs - Number of arguments marked readonly
;   6 functionattrs - Number of arguments marked returned
;  --
;  51
;
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

; Function Attrs: noinline nounwind uwtable
define dso_local i32* @external_ret2_nrw(i32* %n0, i32* %r0, i32* %w0) #0 {
entry:
  %call = call i32* @internal_ret0_nw(i32* %n0, i32* %w0)
  %call1 = call i32* @internal_ret1_rrw(i32* %r0, i32* %r0, i32* %w0)
  %call2 = call i32* @external_sink_ret2_nrw(i32* %n0, i32* %r0, i32* %w0)
  %call3 = call i32* @internal_ret1_rw(i32* %r0, i32* %w0)
  ret i32* %call3
}

; Function Attrs: noinline nounwind uwtable
define internal i32* @internal_ret0_nw(i32* %n0, i32* %w0) #0 {
entry:
  %r0 = alloca i32, align 4
  %r1 = alloca i32, align 4
  %tobool = icmp ne i32* %n0, null
  br i1 %tobool, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  br label %return

if.end:                                           ; preds = %entry
  store i32 3, i32* %r0, align 4
  store i32 5, i32* %r1, align 4
  store i32 1, i32* %w0, align 4
  %call = call i32* @internal_ret1_rrw(i32* %r0, i32* %r1, i32* %w0)
  %call1 = call i32* @external_ret2_nrw(i32* %n0, i32* %r0, i32* %w0)
  %call2 = call i32* @external_ret2_nrw(i32* %n0, i32* %r1, i32* %w0)
  %call3 = call i32* @external_sink_ret2_nrw(i32* %n0, i32* %r0, i32* %w0)
  %call4 = call i32* @external_sink_ret2_nrw(i32* %n0, i32* %r1, i32* %w0)
  %call5 = call i32* @internal_ret0_nw(i32* %n0, i32* %w0)
  br label %return

return:                                           ; preds = %if.end, %if.then
  %retval.0 = phi i32* [ %call5, %if.end ], [ %n0, %if.then ]
  ret i32* %retval.0
}

; Function Attrs: noinline nounwind uwtable
define internal i32* @internal_ret1_rrw(i32* %r0, i32* %r1, i32* %w0) #0 {
entry:
  %0 = load i32, i32* %r0, align 4
  %tobool = icmp ne i32 %0, 0
  br i1 %tobool, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  br label %return

if.end:                                           ; preds = %entry
  %call = call i32* @internal_ret1_rw(i32* %r0, i32* %w0)
  %1 = load i32, i32* %r0, align 4
  %2 = load i32, i32* %r1, align 4
  %add = add nsw i32 %1, %2
  store i32 %add, i32* %w0, align 4
  %call1 = call i32* @internal_ret1_rw(i32* %r1, i32* %w0)
  %call2 = call i32* @internal_ret0_nw(i32* %r0, i32* %w0)
  %call3 = call i32* @internal_ret0_nw(i32* %w0, i32* %w0)
  %call4 = call i32* @external_ret2_nrw(i32* %r0, i32* %r1, i32* %w0)
  %call5 = call i32* @external_ret2_nrw(i32* %r1, i32* %r0, i32* %w0)
  %call6 = call i32* @external_sink_ret2_nrw(i32* %r0, i32* %r1, i32* %w0)
  %call7 = call i32* @external_sink_ret2_nrw(i32* %r1, i32* %r0, i32* %w0)
  %call8 = call i32* @internal_ret0_nw(i32* %r1, i32* %w0)
  br label %return

return:                                           ; preds = %if.end, %if.then
  %retval.0 = phi i32* [ %call8, %if.end ], [ %r1, %if.then ]
  ret i32* %retval.0
}

; Function Attrs: noinline nounwind uwtable
define dso_local i32* @external_sink_ret2_nrw(i32* %n0, i32* %r0, i32* %w0) #0 {
entry:
  %tobool = icmp ne i32* %n0, null
  br i1 %tobool, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  br label %return

if.end:                                           ; preds = %entry
  %0 = load i32, i32* %r0, align 4
  store i32 %0, i32* %w0, align 4
  br label %return

return:                                           ; preds = %if.end, %if.then
  ret i32* %w0
}

; Function Attrs: noinline nounwind uwtable
define internal i32* @internal_ret1_rw(i32* %r0, i32* %w0) #0 {
entry:
  %0 = load i32, i32* %r0, align 4
  %tobool = icmp ne i32 %0, 0
  br i1 %tobool, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  br label %return

if.end:                                           ; preds = %entry
  %call = call i32* @internal_ret1_rrw(i32* %r0, i32* %r0, i32* %w0)
  %1 = load i32, i32* %r0, align 4
  store i32 %1, i32* %w0, align 4
  %call1 = call i32* @internal_ret0_nw(i32* %r0, i32* %w0)
  %call2 = call i32* @internal_ret0_nw(i32* %w0, i32* %w0)
  %call3 = call i32* @external_sink_ret2_nrw(i32* %r0, i32* %r0, i32* %w0)
  %call4 = call i32* @external_ret2_nrw(i32* %r0, i32* %r0, i32* %w0)
  br label %return

return:                                           ; preds = %if.end, %if.then
  %retval.0 = phi i32* [ %call4, %if.end ], [ %w0, %if.then ]
  ret i32* %retval.0
}

; Function Attrs: noinline nounwind uwtable
define dso_local i32* @external_source_ret2_nrw(i32* %n0, i32* %r0, i32* %w0) #0
{
entry:
  %call = call i32* @external_sink_ret2_nrw(i32* %n0, i32* %r0, i32* %w0)
  %call1 = call i32* @external_ret2_nrw(i32* %n0, i32* %r0, i32* %w0)
  ret i32* %call1
}
-------------- next part --------------
; RUN: opt -S -functionattrs -enable-nonnull-arg-prop %s | FileCheck %s

; In the following SCC, one function (scc_bottom_vs_top_fn_top) cannot be
; marked nonnull. We should still mark the others as they will always return
; nonnull pointers. However, this only works if we initialize our abstract
; state with bottom (not top), thus optimistically assume nonnull until proven
; otherwise.

declare nonnull i8* @ret_nonnull()

; CHECK: define i8* @scc_bottom_vs_top_fn_top
define i8* @scc_bottom_vs_top_fn_top() {
  call i8* @scc_bottom_vs_top_fn_scc0()
  call i8* @scc_bottom_vs_top_fn_scc1()
  ret i8* null
}

; CHECK: define nonnull i8* @scc_bottom_vs_top_fn_scc0
define i8* @scc_bottom_vs_top_fn_scc0() {

; The null returing function is called here but the result is unused.
  call i8* @scc_bottom_vs_top_fn_top()

  br i1 undef, label %bb0, label %bb1
bb0:
  %ret0 = call i8* @ret_nonnull()
  ret i8* %ret0
bb1:
  %ret1 = call i8* @scc_bottom_vs_top_fn_scc1()
  ret i8* %ret1
}

; CHECK: define nonnull i8* @scc_bottom_vs_top_fn_scc1
define i8* @scc_bottom_vs_top_fn_scc1() {

; The null returing function is called here but the result is unused.
  call i8* @scc_bottom_vs_top_fn_top()

  br i1 undef, label %bb0, label %bb1
bb0:
  %ret0 = call i8* @ret_nonnull()
  ret i8* %ret0
bb1:
  %ret1 = call i8* @scc_bottom_vs_top_fn_scc0()
  ret i8* %ret1
}

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180823/b4ab1a06/attachment.sig>

Hal Finkel via llvm-dev

2018-Sep-01 23:37 UTC

head link

[llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes

Hi, Johannes,

I'm certainly in favor of rewriting the attribute deduction so that it's
more consistent and more powerful. There's a lot more that we could do.
I'll help to review the patches.

On 08/23/2018 10:25 AM, Johannes Doerfert via llvm-dev
wrote:> After I spend some time working with the function attribute* deduction
> pass** [1,3], I would like to propose a "proper" organization***.
>
>
>
> Why?
>
>   Because we do not derive nearly as many attributes as we could****,
>   while we do maintain various (separate and diffently organized)
>   "data-flow-like analyses" to do so.
>
>
>
> What else?
>
>   I propose a single optimistic data-flow (fixpoint) loop to perform the
>   deduction (similar to [2,3] but for all propagation forms and not only
>   call sites -> callee).
Right now, as I recall, we apply an optimistic fixed-point optimization
to each SCC, moving up the call graph from leaves to root. When you say
that you want a more-general inference, besides just call sites ->
callees (and the existing callees -> callers), can you please elaborate?
Doing both of these at the same time as part of a optimistic fixed-point
optimization seems to imply that the iteration will involve the entire
call graph at once (or, at least, each connected component at once). Is
that what you have in mind?
>  Given that a pessimistic initialization allows
>   to terminate early, we can then also vary the compile time investment.
This is certainly a true statement, but I don't understand what you're
proposing. Are you saying that we should have both an optimistic and
pessimistic mode?

Thanks again,
Hal
>
>
>
> Where is the catch?
>
>   It will require a substantial rewrite with (some) parts that cannot be
>   split in small independent patches. While I believe these to be easy
>   to review*****, I want to know if
>     (1) there is interest in having better attribute deduction, and
>     (2) there are volunteers to review such patches.
>
>
>
> I do appreciate any input on this proposal.
>
> Cheers,
>   Johannes
>
> --------
>
> * It derives function attributes but also parameter and argument
>   attributes.
>
> ** lib/Transforms/IPO/FunctionAttrs.cpp
>
> *** This statement is intended to sound harsh. I believe it to be
>     appropriate because it is hard to find consistency in the current
>     implementation. Different parts perform argument deduction similarly
>     but still different. This does lead to code duplication and missed
>     deductions as there is no information exchange going on.
>
> ****  This includes missing attributes in existing propagations
>       e.g., dereferencability (see also [0,1]), missing forms of
>       propagation, e.g., call sites -> callee (see [2,3]), as well as
>       missing deductions due to the fact that there is no (global)
>       fixpoint iteration. Additional examples to showcase some current
>       shortcomings are attached to this mail.
>
> ***** In the sense that they will "just contain" the
implementation of a
>       fixpoint data-flow analysis. The logic to determine/eliminate
>       attributes will be similar to the one we have now.
>
>
> [0] https://reviews.llvm.org/D48387
> [1] https://reviews.llvm.org/D50107
> [2] https://reviews.llvm.org/D4609
> [3] https://reviews.llvm.org/D50125
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180901/b1dbf713/attachment.html>

Johannes Doerfert via llvm-dev

2018-Sep-04 01:27 UTC

head link

[llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes

Hi Hal,

thanks for your interest and questions.

On 09/01, Hal Finkel via llvm-dev wrote:> Hi, Johannes,
> 
> I'm certainly in favor of rewriting the attribute deduction so that
it's
> more consistent and more powerful. There's a lot more that we could do.
> I'll help to review the patches.
Even if we only do it to clean up the code, it is probably worth it.
Deriving new/different kinds of attributes, e.g., "dereferencable" and
"align", is another good reason. Finally, I detailed how we currently
derive attributes (below) to highlight the traversal duplications and
pessimistic choices we currently make.
> On 08/23/2018 10:25 AM, Johannes Doerfert via llvm-dev wrote:
> > After I spend some time working with the function attribute* deduction
> > pass** [1,3], I would like to propose a "proper"
organization***.
> >
> >
> >
> > Why?
> >
> >   Because we do not derive nearly as many attributes as we could****,
> >   while we do maintain various (separate and diffently organized)
> >   "data-flow-like analyses" to do so.
> >
> >
> >
> > What else?
> >
> >   I propose a single optimistic data-flow (fixpoint) loop to perform
the
> >   deduction (similar to [2,3] but for all propagation forms and not
only
> >   call sites -> callee).
> 
> Right now, as I recall, we apply an optimistic fixed-point optimization
> to each SCC, moving up the call graph from leaves to root.
Before I describe what I plan to below, I want to elaborate on the
current situation. Let me try to summarize it in a bit more detail:

We have two inference passes post order (po) and reverse post order
(rpo) of the call graph (CG). The second is not very interesting as it
is neither actual rpo nor do we use it for anything but to detect
non-recursive functions based on non-recursive callers (which seems to
be a special situation but I was unable to find any details on that).
Now the po traversal does many things, separately, in the following order:

  - addArgumentReturnedAttrs() -- "returned" parameter attribute.
    Traversal of all blocks to find an argument that (1) has the same
    type as the return type, and (2) is returned by all "return"
    instructions. This is not optimistic and return values of other
    functions in the SCC will prevent the deduction.

  - addReadAttrs() -- "readonly"/"readnone" function
attributes.
    Traverses all instructions in the SCC and optimistically determines
    "readonly"/"readnone" function attributes.

  - addArgumentAttrs() --
"nonnull"/"nocapture"/"readonly"/"readnone"
                          "writeonly" parameter attributes
    Traverse the uses of arguments through the SCC to determine
   
"nocapture"/"readonly"/"readnone"/"writeonly"
attributes.
    Additionally perform the callee -> call site deduction for
"nonnull"
    attributes if the call site is for sure executed. Note that
    "returned" pointers (in the SCC) are assumed captured and
therefore
    neither "readonly", "readnone" nor "writeonly"
even though they do
    not outlive the function/SCC.

  - addNoAliasAttrs() -- "noalias" function return value
    Check for "malloc-like" SCCs by verifying that if all return
values
    are "NULL", "undef" or "noalias" return values
of calls.

  - addNonNullAttrs() -- "nonnull" function return value
    Check for "nonnull" return values by verifying all function return
    values are non-null. However, this is not optimistic. Returning
    the return value of another function in the SCC is considered
    potentially null.

  - removeConvergentAttrs() -- remove "convergent" function attribute
    Traverse all instructions in the SCC and remove the "convergent"
    attribute if all contained call sites are "non-convergent".

  - addNoRecurseAttrs() -- "norecurse" function attribute
    Traverse all instructions in a singleton SCC and mark it as
"norecurse"
    if there is no self or potentially recursive call.
> When you say that you want a more-general inference, besides just call
> sites -> callees (and the existing callees -> callers), can you
please
> elaborate?  Doing both of these at the same time as part of a
> optimistic fixed-point optimization seems to imply that the iteration
> will involve the entire call graph at once (or, at least, each
> connected component at once). Is that what you have in mind?
There is certainly a large design space that allows variations. If we
properly traverse the SCCs in post order and optimistically derive all
attributes, it should already improve on the current state. However, the
callees -> call sites deduction requires to look into the callers of the
SCC as well, assuming all are known.

For the best result we probably need to combine three deduction methods:
  - If something is known for all call sites it is known for the callee.
  - If something is know for the callee it is known for all call sites.
  - If something can be derived from the code, e.g., guaranteed executed
    calls, it is known for the function.

One optimistic fixpoint iteration for the whole call graph is beneficial
if we can use information derived by one method for another one. This is
the case if we have domain knowledge, e.g., "read-only" annotations
from
the user or the front-end on an interior node in the call graph. For
descriptive properties, like "non-null", this can also work very well.
If we expect additional attributes of the latter kind, or more
annotations through the front-ends/users, we should consider this
solution.
> >  Given that a pessimistic initialization allows to terminate early,
> >  we can then also vary the compile time investment.
> 
> This is certainly a true statement, but I don't understand what
you're
> proposing. Are you saying that we should have both an optimistic and
> pessimistic mode?
I was just saying that we can easily use the same framework for both,
optimistic and pessimistic deduction:
  - If we want a fast result, or be able to stop the fixpoint iteration
    early, we start with a pessimistic state,
  - If we want the best possible result, we start with an optimistic
    state.

Does this make sense?

Cheers,
  Johannes
> Thanks again, Hal
> 
> >
> >
> >
> > Where is the catch?
> >
> >   It will require a substantial rewrite with (some) parts that
> >   cannot be split in small independent patches. While I believe
> >   these to be easy to review*****, I want to know if (1) there is
> >   interest in having better attribute deduction, and (2) there are
> >   volunteers to review such patches.
> >
> >
> >
> > I do appreciate any input on this proposal.
> >
> > Cheers, Johannes
> >
> > --------
> >
> > * It derives function attributes but also parameter and argument
> > attributes.
> >
> > ** lib/Transforms/IPO/FunctionAttrs.cpp
> >
> > *** This statement is intended to sound harsh. I believe it to be
> > appropriate because it is hard to find consistency in the current
> > implementation. Different parts perform argument deduction similarly
> > but still different. This does lead to code duplication and missed
> > deductions as there is no information exchange going on.
> >
> > ****  This includes missing attributes in existing propagations
> > e.g., dereferencability (see also [0,1]), missing forms of
> > propagation, e.g., call sites -> callee (see [2,3]), as well as
> > missing deductions due to the fact that there is no (global)
> > fixpoint iteration. Additional examples to showcase some current
> > shortcomings are attached to this mail.
> >
> > ***** In the sense that they will "just contain" the
implementation
> > of a fixpoint data-flow analysis. The logic to determine/eliminate
> > attributes will be similar to the one we have now.
> >
> >
> > [0] https://reviews.llvm.org/D48387 [1]
> > https://reviews.llvm.org/D50107 [2] https://reviews.llvm.org/D4609
> > [3] https://reviews.llvm.org/D50125
> >
> >
> >
> > _______________________________________________ LLVM Developers
> > mailing list llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> -- Hal Finkel Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Johannes Doerfert
PhD Student / Researcher

Compiler Design Lab (Professor Hack) / Argonne National Laboratory
Saarland Informatics Campus, Germany / Lemont, IL 60439, USA
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de / jdoerfert at
anl.gov
Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180904/e8650b22/attachment.sig>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Aug 2018 - [RFC] "Properly" Derive Function/Argument/Parameter Attributes

[llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes

[llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes

[llvm-dev] [RFC] "Properly" Derive Function/Argument/Parameter Attributes

Reasonably Related Threads