thr3ads.net - llvm dev - [llvm-dev] DCE in the presence of control flow. [Jan 2016]

If this information is useful, please help other people find it:
Share via:

David Callahan via llvm-dev

2016-Jan-30 04:31 UTC

[llvm-dev] DCE in the presence of control flow.

I think you can also avoid the RDF computation using a more directed form of
control dependence testing such as described in

Keshav Pingali and Gianfranco Bilardi. 1997. Optimal control dependence
computation and the Roman chariots problem. ACM Trans. Program. Lang. Syst. 19,
3 (May 1997), 462-491. DOI=http://dx.doi.org/10.1145/256167.256217

However one challenge seems to be fixing the SSA graph after deleting
essentially arbitrary connected regions of the control flow graph. It may be
that the common case where deleted control flow has a single entry and a common
post-dominator which seems straight forward but in the general case it seems
much harder. Any prior experience on that problem?
david

From: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Date: Friday, January 29, 2016 at 4:50 PM
To: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Cc: LLVM Dev Mailing list <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>, David Callahan <dcallahan at
fb.com<mailto:dcallahan at fb.com>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.



________________________________
From: "Daniel Berlin" <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>
Cc: "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>, "David
Callahan" <dcallahan at fb.com<mailto:dcallahan at fb.com>>
Sent: Friday, January 29, 2016 12:48:37 AM
Subject: Re: [llvm-dev] DCE in the presence of control flow.



On Thu, Jan 28, 2016 at 10:09 PM, Hal Finkel <hfinkel at
anl.gov<mailto:hfinkel at anl.gov>> wrote:

________________________________
From: "David Callahan via llvm-dev" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: "Daniel Berlin" <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>, "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: Thursday, January 28, 2016 11:35:49 PM
Subject: Re: [llvm-dev] DCE in the presence of control flow.

Thanks
Also I found that some cases are also caught by a specialized routine to remove
dead loops which is missing the case I noticed.
odavd

From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Date: Thursday, January 28, 2016 at 8:45 PM
To: David Callahan <dcallahan at fb.com<mailto:dcallahan at
fb.com>>, LLVM Dev Mailing list <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

The post dominators computation costs more on llvm than GCC because of how the
respective cfgs work under the covers.  Even for GCC, when we implemented
cd-dce, it only catches 1-2% more cases.  It's not really worth the cost in
llvm unless postdom comes free

A 1-2% reduction in code size seems like it might well be worth a post-dom
calculation.

1-2% more cases != 1-2% reduction in code size. In particular, it assumes
nothing else will catch those cases :)
Fair enough.

The cases are mostly caught by SimplifyCFG/etc anyway

In any case, here are the numbers from when it was turned on by default:

https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00675.html

Note: GCC is at least 3x faster at computing post-dom than LLVM
Why?

Also, what exactly is the algorithm using post-dom info?


So, to review for a second, right now, the algorithm answers the question when
is a branch necessary with "always" :)

The real answer is "when an already-necessary operation depends on its
existence".  This is of course, requires control-dependence to answer.
So if you take our current DCE algorithm, and instead of marking terminators
always-live, it simply marks control dependent edges of those operands as
necessary, and branches that generate those edges.

(IE

for each block in RDF(useful block):
  mark terminator of block as useful


FWIW, Keith Cooper's slide deck on this has a nice explanation:
https://www.cs.rice.edu/~keith/512/2011/Lectures/L04Dead-1up.pdf

We might be able to do this without precomputing an RDF, however. For example,
you could solve a data-flow problem on the reverse CFG, where for each block you
solve for the "next" live instruction. Then a branch is alive only if
the next live instruction for each successor is different. You'd need to
have "next-live-instruction phi nodes" for cases where there's not
a unique answer, etc.

Thanks again,
Hal

I suppose that we're looking for cases where we have a CFG diamond with one
having only dead instructions, and loops with all dead instructions, etc.

Yes.  Loops with all dead instructions includes "loops with no side-effects
outside of the loop"


Thanks again,
Hal

On Wed, Jan 27, 2016, 1:56 PM David Callahan via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I have been looking at some internal codes looking for differences between Clang
(specifically 3.7.1) and gcc (typically 4.8.1 but sometimes later).

One area where I bumped into was dead code elimination in the presence of
complex control flow. I note that the “aggressive dead code elimination”
(ADCE.cpp) treats all branch operations as live (isa<TerminatorInst>(I)). 
Doing more requires some approximation to control dependence.

I note SimplifyCFG indirectly handles some simple cases. It will speculate the
contents of a basic block into a predecessor but this is insufficient for more
complex structures. This probably cherry-picks the most common cases by
frequency.

Have their been prior attempts strengthen dead code elimination w.r.t. control
flow? If so, any guidance on what went wrong?

Thanks
David

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=pCcZikgFQttaHaETuHc6G00dgArj_Spf58imKkXlTqk&s=NTh5Q1gE2ANS1rQYN9XFok_t8wvWCu1dzzzvHfv3hlI&e=>

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=7nHK35pXeFc5plcmZyxIPJwB8O8ZyHCJ4P89mN8BoZQ&s=BIA3X52dktMwTkOdClno4EdVyZHbON1Yd_OB0ZehhB0&e=>



--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory




--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160130/8335c09e/attachment.html>

Daniel Berlin via llvm-dev

2016-Jan-30 06:26 UTC

head link

[llvm-dev] DCE in the presence of control flow.

In practice, APT is not faster to build than rdf.
The df calculator we use is linear time and quite fast.

Updating is also pretty trivial since it's only deletes of dead and
unreachable code.  So anything it reached can be replaced with undef in
most cases.

Cd-dce is not slower in GCC than dce

On Fri, Jan 29, 2016, 8:31 PM David Callahan <dcallahan at fb.com> wrote:
> I think you can also avoid the RDF computation using a more directed form
> of control dependence testing such as described in
>
> Keshav Pingali and Gianfranco Bilardi. 1997. Optimal control dependence
> computation and the Roman chariots problem. ACM Trans. Program. Lang. Syst.
> 19, 3 (May 1997), 462-491. DOI=http://dx.doi.org/10.1145/256167.256217
>
> However one challenge seems to be fixing the SSA graph after deleting
> essentially arbitrary connected regions of the control flow graph. It may
> be that the common case where deleted control flow has a single entry and
> a common post-dominator which seems straight forward but in the general
> case it seems much harder. Any prior experience on that problem?
> david
>
> From: Hal Finkel <hfinkel at anl.gov>
> Date: Friday, January 29, 2016 at 4:50 PM
> To: Daniel Berlin <dberlin at dberlin.org>
>
> Cc: LLVM Dev Mailing list <llvm-dev at lists.llvm.org>, David
Callahan <
> dcallahan at fb.com>
> Subject: Re: [llvm-dev] DCE in the presence of control flow.
>
>
>
> ------------------------------
>
> *From: *"Daniel Berlin" <dberlin at dberlin.org>
> *To: *"Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"LLVM Dev Mailing list" <llvm-dev at lists.llvm.org>,
"David Callahan"
> <dcallahan at fb.com>
> *Sent: *Friday, January 29, 2016 12:48:37 AM
> *Subject: *Re: [llvm-dev] DCE in the presence of control flow.
>
>
>
> On Thu, Jan 28, 2016 at 10:09 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>
>>
>> ------------------------------
>>
>> *From: *"David Callahan via llvm-dev" <llvm-dev at
lists.llvm.org>
>> *To: *"Daniel Berlin" <dberlin at dberlin.org>,
"LLVM Dev Mailing list" <
>> llvm-dev at lists.llvm.org>
>> *Sent: *Thursday, January 28, 2016 11:35:49 PM
>> *Subject: *Re: [llvm-dev] DCE in the presence of control flow.
>>
>> Thanks
>> Also I found that some cases are also caught by a specialized routine
to
>> remove dead loops which is missing the case I noticed.
>> odavd
>>
>> From: Daniel Berlin <dberlin at dberlin.org>
>> Date: Thursday, January 28, 2016 at 8:45 PM
>> To: David Callahan <dcallahan at fb.com>, LLVM Dev Mailing list
<
>> llvm-dev at lists.llvm.org>
>> Subject: Re: [llvm-dev] DCE in the presence of control flow.
>>
>> The post dominators computation costs more on llvm than GCC because of
>> how the respective cfgs work under the covers.  Even for GCC, when we
>> implemented cd-dce, it only catches 1-2% more cases.  It's not
really worth
>> the cost in llvm unless postdom comes free
>>
>>
>> A 1-2% reduction in code size seems like it might well be worth a
>> post-dom calculation.
>>
>
> 1-2% more cases != 1-2% reduction in code size. In particular, it assumes
> nothing else will catch those cases :)
>
> Fair enough.
>
>
> The cases are mostly caught by SimplifyCFG/etc anyway
>
> In any case, here are the numbers from when it was turned on by default:
>
> https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00675.html
>
> Note: GCC is at least 3x faster at computing post-dom than LLVM
>
> Why?
>
>
> Also, what exactly is the algorithm using post-dom info?
>>
>>
> So, to review for a second, right now, the algorithm answers the question
> when is a branch necessary with "always" :)
>
> The real answer is "when an already-necessary operation depends on its
> existence".  This is of course, requires control-dependence to answer.
> So if you take our current DCE algorithm, and instead of marking
> terminators always-live, it simply marks control dependent edges of those
> operands as necessary, and branches that generate those edges.
>
> (IE
>
> for each block in RDF(useful block):
>   mark terminator of block as useful
>
>
> FWIW, Keith Cooper's slide deck on this has a nice explanation:
> https://www.cs.rice.edu/~keith/512/2011/Lectures/L04Dead-1up.pdf
>
> We might be able to do this without precomputing an RDF, however. For
> example, you could solve a data-flow problem on the reverse CFG, where for
> each block you solve for the "next" live instruction. Then a
branch is
> alive only if the next live instruction for each successor is different.
> You'd need to have "next-live-instruction phi nodes" for
cases where
> there's not a unique answer, etc.
>
> Thanks again,
> Hal
>
>
> I suppose that we're looking for cases where we have a CFG diamond with
>> one having only dead instructions, and loops with all dead
instructions,
>> etc.
>>
>
> Yes.  Loops with all dead instructions includes "loops with no
> side-effects outside of the loop"
>
>
>>
>>
> Thanks again,
>> Hal
>>
>>
>> On Wed, Jan 27, 2016, 1:56 PM David Callahan via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> I have been looking at some internal codes looking for differences
>>> between Clang (specifically 3.7.1) and gcc (typically 4.8.1 but
sometimes
>>> later).
>>>
>>>
>>>
>>> One area where I bumped into was dead code elimination in the
presence
>>> of complex control flow. I note that the “aggressive dead code
elimination”
>>> (ADCE.cpp) treats all branch operations as live
(isa<TerminatorInst>(I)).
>>> Doing more requires some approximation to control dependence.
>>>
>>>
>>>
>>> I note SimplifyCFG indirectly handles some simple cases. It will
>>> speculate the contents of a basic block into a predecessor but this
is
>>> insufficient for more complex structures. This probably
cherry-picks the
>>> most common cases by frequency.
>>>
>>>
>>>
>>> Have their been prior attempts strengthen dead code elimination
w.r.t.
>>> control flow? If so, any guidance on what went wrong?
>>>
>>>
>>>
>>> Thanks
>>>
>>> David
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=pCcZikgFQttaHaETuHc6G00dgArj_Spf58imKkXlTqk&s=NTh5Q1gE2ANS1rQYN9XFok_t8wvWCu1dzzzvHfv3hlI&e=>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=7nHK35pXeFc5plcmZyxIPJwB8O8ZyHCJ4P89mN8BoZQ&s=BIA3X52dktMwTkOdClno4EdVyZHbON1Yd_OB0ZehhB0&e=>
>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160130/5b9fe66a/attachment.html>

David Callahan via llvm-dev

2016-Jan-30 15:02 UTC

head link

[llvm-dev] DCE in the presence of control flow.

I had assumed you would treat phi nodes differently from other operations in
that they don’t need to keep the block alive just to retain the data flow facts
but it would be simplest to do that.
Thanks Daniel

From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Date: Friday, January 29, 2016 at 10:26 PM
To: David Callahan <dcallahan at fb.com<mailto:dcallahan at
fb.com>>, Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>
Cc: LLVM Dev Mailing list <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

In practice, APT is not faster to build than rdf.
The df calculator we use is linear time and quite fast.

Updating is also pretty trivial since it's only deletes of dead and
unreachable code.  So anything it reached can be replaced with undef in most
cases.

Cd-dce is not slower in GCC than dce

On Fri, Jan 29, 2016, 8:31 PM David Callahan <dcallahan at
fb.com<mailto:dcallahan at fb.com>> wrote:
I think you can also avoid the RDF computation using a more directed form of
control dependence testing such as described in

Keshav Pingali and Gianfranco Bilardi. 1997. Optimal control dependence
computation and the Roman chariots problem. ACM Trans. Program. Lang. Syst. 19,
3 (May 1997), 462-491.
DOI=http://dx.doi.org/10.1145/256167.256217<https://urldefense.proofpoint.com/v2/url?u=http-3A__dx.doi.org_10.1145_256167.256217&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=85DfLWatM6urqzYLUvrWBgn5LKUxCbQ3YheI3kh5XdU&s=qJbIjqiOKcfzbhZ33-9K0SGsouUl_rvtRgEareBdqGY&e=>

However one challenge seems to be fixing the SSA graph after deleting
essentially arbitrary connected regions of the control flow graph. It may be
that the common case where deleted control flow has a single entry and a common
post-dominator which seems straight forward but in the general case it seems
much harder. Any prior experience on that problem?
david

From: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Date: Friday, January 29, 2016 at 4:50 PM
To: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>

Cc: LLVM Dev Mailing list <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>, David Callahan <dcallahan at
fb.com<mailto:dcallahan at fb.com>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

________________________________
From: "Daniel Berlin" <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>
Cc: "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>, "David
Callahan" <dcallahan at fb.com<mailto:dcallahan at fb.com>>
Sent: Friday, January 29, 2016 12:48:37 AM
Subject: Re: [llvm-dev] DCE in the presence of control flow.

On Thu, Jan 28, 2016 at 10:09 PM, Hal Finkel <hfinkel at
anl.gov<mailto:hfinkel at anl.gov>> wrote:

________________________________
From: "David Callahan via llvm-dev" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: "Daniel Berlin" <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>, "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: Thursday, January 28, 2016 11:35:49 PM
Subject: Re: [llvm-dev] DCE in the presence of control flow.

Thanks
Also I found that some cases are also caught by a specialized routine to remove
dead loops which is missing the case I noticed.
odavd

From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Date: Thursday, January 28, 2016 at 8:45 PM
To: David Callahan <dcallahan at fb.com<mailto:dcallahan at
fb.com>>, LLVM Dev Mailing list <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

The post dominators computation costs more on llvm than GCC because of how the
respective cfgs work under the covers.  Even for GCC, when we implemented
cd-dce, it only catches 1-2% more cases.  It's not really worth the cost in
llvm unless postdom comes free

A 1-2% reduction in code size seems like it might well be worth a post-dom
calculation.

1-2% more cases != 1-2% reduction in code size. In particular, it assumes
nothing else will catch those cases :)
Fair enough.

The cases are mostly caught by SimplifyCFG/etc anyway

In any case, here are the numbers from when it was turned on by default:

https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00675.html

Note: GCC is at least 3x faster at computing post-dom than LLVM
Why?

Also, what exactly is the algorithm using post-dom info?

So, to review for a second, right now, the algorithm answers the question when
is a branch necessary with "always" :)

The real answer is "when an already-necessary operation depends on its
existence".  This is of course, requires control-dependence to answer.
So if you take our current DCE algorithm, and instead of marking terminators
always-live, it simply marks control dependent edges of those operands as
necessary, and branches that generate those edges.

(IE

for each block in RDF(useful block):
  mark terminator of block as useful

FWIW, Keith Cooper's slide deck on this has a nice explanation:
https://www.cs.rice.edu/~keith/512/2011/Lectures/L04Dead-1up.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cs.rice.edu_-7Ekeith_512_2011_Lectures_L04Dead-2D1up.pdf&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=85DfLWatM6urqzYLUvrWBgn5LKUxCbQ3YheI3kh5XdU&s=C3xP_URm9hGCDdAsg-pKQsJ6sWhEVYgyYAGKbhC6ENU&e=>

We might be able to do this without precomputing an RDF, however. For example,
you could solve a data-flow problem on the reverse CFG, where for each block you
solve for the "next" live instruction. Then a branch is alive only if
the next live instruction for each successor is different. You'd need to
have "next-live-instruction phi nodes" for cases where there's not
a unique answer, etc.

Thanks again,
Hal

I suppose that we're looking for cases where we have a CFG diamond with one
having only dead instructions, and loops with all dead instructions, etc.

Yes.  Loops with all dead instructions includes "loops with no side-effects
outside of the loop"

Thanks again,
Hal

On Wed, Jan 27, 2016, 1:56 PM David Callahan via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I have been looking at some internal codes looking for differences between Clang
(specifically 3.7.1) and gcc (typically 4.8.1 but sometimes later).

One area where I bumped into was dead code elimination in the presence of
complex control flow. I note that the “aggressive dead code elimination”
(ADCE.cpp) treats all branch operations as live (isa<TerminatorInst>(I)). 
Doing more requires some approximation to control dependence.

I note SimplifyCFG indirectly handles some simple cases. It will speculate the
contents of a basic block into a predecessor but this is insufficient for more
complex structures. This probably cherry-picks the most common cases by
frequency.

Have their been prior attempts strengthen dead code elimination w.r.t. control
flow? If so, any guidance on what went wrong?

Thanks
David

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=pCcZikgFQttaHaETuHc6G00dgArj_Spf58imKkXlTqk&s=NTh5Q1gE2ANS1rQYN9XFok_t8wvWCu1dzzzvHfv3hlI&e=>

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=7nHK35pXeFc5plcmZyxIPJwB8O8ZyHCJ4P89mN8BoZQ&s=BIA3X52dktMwTkOdClno4EdVyZHbON1Yd_OB0ZehhB0&e=>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160130/3d1bcfa8/attachment.html>

Mark Lacey via llvm-dev

2016-Feb-02 19:59 UTC

head link

[llvm-dev] DCE in the presence of control flow.

> On Jan 29, 2016, at 8:31 PM, David Callahan via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I think you can also avoid the RDF computation using a more directed form
of control dependence testing such as described in
> 
> Keshav Pingali and Gianfranco Bilardi. 1997. Optimal control dependence
computation and the Roman chariots problem. ACM Trans. Program. Lang. Syst. 19,
3 (May 1997), 462-491. DOI=http://dx.doi.org/10.1145/256167.256217
<http://dx.doi.org/10.1145/256167.256217>
> 
> However one challenge seems to be fixing the SSA graph after deleting
essentially arbitrary connected regions of the control flow graph. It may be
that the common case where deleted control flow has a single entry and a common
post-dominator which seems straight forward but in the general case it seems
much harder. Any prior experience on that problem?
> david
Hi David,

You may want to consider looking at the implementation of DCE in the Swift
optimizer. It should be fairly straightforward to follow despite not being LLVM
IR.

I implemented something inspired by the above paper, but it does not implement
the full construction of the sets mentioned in the paper. Instead it precomputes
some information to attempt to do a limited walk of the post-dominator tree. I
think it has worked well in practice.

The code for the precomputation portion starts here:
https://github.com/apple/swift/blob/master/lib/SILOptimizer/Transforms/DeadCodeElimination.cpp#L502
<https://github.com/apple/swift/blob/master/lib/SILOptimizer/Transforms/DeadCodeElimination.cpp#L502>

Feel free to ping me off list if you have any questions about it.

Mark

> 
> From: Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at
anl.gov>>
> Date: Friday, January 29, 2016 at 4:50 PM
> To: Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at
dberlin.org>>
> Cc: LLVM Dev Mailing list <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>, David Callahan <dcallahan at
fb.com <mailto:dcallahan at fb.com>>
> Subject: Re: [llvm-dev] DCE in the presence of control flow.
> 
> 
> 
> From: "Daniel Berlin" <dberlin at dberlin.org
<mailto:dberlin at dberlin.org>>
> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at
anl.gov>>
> Cc: "LLVM Dev Mailing list" <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>, "David Callahan"
<dcallahan at fb.com <mailto:dcallahan at fb.com>>
> Sent: Friday, January 29, 2016 12:48:37 AM
> Subject: Re: [llvm-dev] DCE in the presence of control flow.
> 
> 
> 
> On Thu, Jan 28, 2016 at 10:09 PM, Hal Finkel <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>> wrote:
> 
> From: "David Callahan via llvm-dev" <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> To: "Daniel Berlin" <dberlin at dberlin.org <mailto:dberlin
at dberlin.org>>, "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> Sent: Thursday, January 28, 2016 11:35:49 PM
> Subject: Re: [llvm-dev] DCE in the presence of control flow.
> 
> Thanks
> Also I found that some cases are also caught by a specialized routine to
remove dead loops which is missing the case I noticed.
> odavd
> 
> From: Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at
dberlin.org>>
> Date: Thursday, January 28, 2016 at 8:45 PM
> To: David Callahan <dcallahan at fb.com <mailto:dcallahan at
fb.com>>, LLVM Dev Mailing list <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> Subject: Re: [llvm-dev] DCE in the presence of control flow.
> 
> The post dominators computation costs more on llvm than GCC because of how
the respective cfgs work under the covers.  Even for GCC, when we implemented
cd-dce, it only catches 1-2% more cases.  It's not really worth the cost in
llvm unless postdom comes free
> 
> A 1-2% reduction in code size seems like it might well be worth a post-dom
calculation.
> 
> 1-2% more cases != 1-2% reduction in code size. In particular, it assumes
nothing else will catch those cases :)
> Fair enough.
>  
> The cases are mostly caught by SimplifyCFG/etc anyway
> 
> In any case, here are the numbers from when it was turned on by default:
> 
> https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00675.html
<https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00675.html>
> 
> Note: GCC is at least 3x faster at computing post-dom than LLVM
> Why?
> 
> Also, what exactly is the algorithm using post-dom info?
> 
> 
> So, to review for a second, right now, the algorithm answers the question
when is a branch necessary with "always" :)
> 
> The real answer is "when an already-necessary operation depends on its
existence".  This is of course, requires control-dependence to answer.
> So if you take our current DCE algorithm, and instead of marking
terminators always-live, it simply marks control dependent edges of those
operands as necessary, and branches that generate those edges.
> 
> (IE 
> 
> for each block in RDF(useful block):
>   mark terminator of block as useful
> 
> 
> FWIW, Keith Cooper's slide deck on this has a nice explanation:
https://www.cs.rice.edu/~keith/512/2011/Lectures/L04Dead-1up.pdf
<https://www.cs.rice.edu/~keith/512/2011/Lectures/L04Dead-1up.pdf>
> 
> We might be able to do this without precomputing an RDF, however. For
example, you could solve a data-flow problem on the reverse CFG, where for each
block you solve for the "next" live instruction. Then a branch is
alive only if the next live instruction for  each successor is different.
You'd need to have "next-live-instruction phi nodes" for cases
where there's not a unique answer, etc.
> 
> Thanks again,
> Hal
> 
> I suppose that we're looking for cases where we have a CFG diamond with
one having only dead instructions, and loops with all dead instructions, etc.
> 
> Yes.  Loops with all dead instructions includes "loops with no
side-effects outside of the loop"
>  
>  
> Thanks again,
> Hal
> 
> On Wed, Jan 27, 2016, 1:56 PM David Callahan via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> I have been looking at some internal codes looking for differences between
Clang (specifically 3.7.1) and gcc (typically 4.8.1 but sometimes later).
>  
> One area where I bumped into was dead code elimination in the presence of
complex control flow. I note that the “aggressive dead code elimination”
(ADCE.cpp) treats all branch operations as live (isa<TerminatorInst>(I)). 
Doing more requires some approximation to control dependence.
>  
> I note SimplifyCFG indirectly handles some simple cases. It will speculate
the contents of a basic block into a predecessor but this is insufficient for
more complex structures. This probably cherry-picks the most common cases by
frequency.
>  
> Have their been prior attempts strengthen dead code elimination w.r.t.
control flow? If so, any guidance on what went wrong?
>  
> Thanks
> David
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=pCcZikgFQttaHaETuHc6G00dgArj_Spf58imKkXlTqk&s=NTh5Q1gE2ANS1rQYN9XFok_t8wvWCu1dzzzvHfv3hlI&e=>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=7nHK35pXeFc5plcmZyxIPJwB8O8ZyHCJ4P89mN8BoZQ&s=BIA3X52dktMwTkOdClno4EdVyZHbON1Yd_OB0ZehhB0&e=>
> 
> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 
> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160202/96e13d99/attachment-0001.html>

David Callahan via llvm-dev

2016-Feb-02 20:42 UTC

head link

[llvm-dev] DCE in the presence of control flow.

Thanks for the pointer Mark.

From: <mark.lacey at apple.com<mailto:mark.lacey at apple.com>> on
behalf of Mark Lacey <mark.lacey at apple.com<mailto:mark.lacey at
apple.com>>
Date: Tuesday, February 2, 2016 at 11:59 AM
To: David Callahan <dcallahan at fb.com<mailto:dcallahan at fb.com>>
Cc: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>,
Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>, LLVM Dev Mailing list <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

On Jan 29, 2016, at 8:31 PM, David Callahan via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

I think you can also avoid the RDF computation using a more directed form of
control dependence testing such as described in

Keshav Pingali and Gianfranco Bilardi. 1997. Optimal control dependence
computation and the Roman chariots problem. ACM Trans. Program. Lang. Syst. 19,
3 (May 1997), 462-491.
DOI=http://dx.doi.org/10.1145/256167.256217<https://urldefense.proofpoint.com/v2/url?u=http-3A__dx.doi.org_10.1145_256167.256217&d=CwMF-g&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=bkSDsLc-hm2xvKwwoqqSo8EzOLlbhIiC39Y7SBxZk9E&s=f-W3wokV6YmCXwBRb1PHRkcDB4uTM8IRxP2iaSJBvYo&e=>

However one challenge seems to be fixing the SSA graph after deleting
essentially arbitrary connected regions of the control flow graph. It may be
that the common case where deleted control flow has a single entry and a common
post-dominator which seems straight forward but in the general case it seems
much harder. Any prior experience on that problem?
david

Hi David,

You may want to consider looking at the implementation of DCE in the Swift
optimizer. It should be fairly straightforward to follow despite not being LLVM
IR.

I implemented something inspired by the above paper, but it does not implement
the full construction of the sets mentioned in the paper. Instead it precomputes
some information to attempt to do a limited walk of the post-dominator tree. I
think it has worked well in practice.

The code for the precomputation portion starts here:
https://github.com/apple/swift/blob/master/lib/SILOptimizer/Transforms/DeadCodeElimination.cpp#L502

Feel free to ping me off list if you have any questions about it.

Mark

From: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Date: Friday, January 29, 2016 at 4:50 PM
To: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Cc: LLVM Dev Mailing list <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>, David Callahan <dcallahan at
fb.com<mailto:dcallahan at fb.com>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

________________________________
From: "Daniel Berlin" <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>
Cc: "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>, "David
Callahan" <dcallahan at fb.com<mailto:dcallahan at fb.com>>
Sent: Friday, January 29, 2016 12:48:37 AM
Subject: Re: [llvm-dev] DCE in the presence of control flow.

On Thu, Jan 28, 2016 at 10:09 PM, Hal Finkel <hfinkel at
anl.gov<mailto:hfinkel at anl.gov>> wrote:

________________________________
From: "David Callahan via llvm-dev" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: "Daniel Berlin" <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>, "LLVM Dev Mailing list" <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: Thursday, January 28, 2016 11:35:49 PM
Subject: Re: [llvm-dev] DCE in the presence of control flow.

Thanks
Also I found that some cases are also caught by a specialized routine to remove
dead loops which is missing the case I noticed.
odavd

From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Date: Thursday, January 28, 2016 at 8:45 PM
To: David Callahan <dcallahan at fb.com<mailto:dcallahan at
fb.com>>, LLVM Dev Mailing list <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] DCE in the presence of control flow.

The post dominators computation costs more on llvm than GCC because of how the
respective cfgs work under the covers.  Even for GCC, when we implemented
cd-dce, it only catches 1-2% more cases.  It's not really worth the cost in
llvm unless postdom comes free

A 1-2% reduction in code size seems like it might well be worth a post-dom
calculation.

1-2% more cases != 1-2% reduction in code size. In particular, it assumes
nothing else will catch those cases :)
Fair enough.

The cases are mostly caught by SimplifyCFG/etc anyway

In any case, here are the numbers from when it was turned on by default:

https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00675.html

Note: GCC is at least 3x faster at computing post-dom than LLVM
Why?

Also, what exactly is the algorithm using post-dom info?

So, to review for a second, right now, the algorithm answers the question when
is a branch necessary with "always" :)

The real answer is "when an already-necessary operation depends on its
existence".  This is of course, requires control-dependence to answer.
So if you take our current DCE algorithm, and instead of marking terminators
always-live, it simply marks control dependent edges of those operands as
necessary, and branches that generate those edges.

(IE

for each block in RDF(useful block):
  mark terminator of block as useful

FWIW, Keith Cooper's slide deck on this has a nice explanation:
https://www.cs.rice.edu/~keith/512/2011/Lectures/L04Dead-1up.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cs.rice.edu_-7Ekeith_512_2011_Lectures_L04Dead-2D1up.pdf&d=CwMF-g&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=bkSDsLc-hm2xvKwwoqqSo8EzOLlbhIiC39Y7SBxZk9E&s=VP755B-IANixWhlYYMccob0Onb-q0t9Ba5lwiRuYFQs&e=>

We might be able to do this without precomputing an RDF, however. For example,
you could solve a data-flow problem on the reverse CFG, where for each block you
solve for the "next" live instruction. Then a branch is alive only if
the next live instruction for  each successor is different. You'd need to
have "next-live-instruction phi nodes" for cases where there's not
a unique answer, etc.

Thanks again,
Hal

I suppose that we're looking for cases where we have a CFG diamond with one
having only dead instructions, and loops with all dead instructions, etc.

Yes.  Loops with all dead instructions includes "loops with no side-effects
outside of the loop"

Thanks again,
Hal

On Wed, Jan 27, 2016, 1:56 PM David Callahan via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
I have been looking at some internal codes looking for differences between Clang
(specifically 3.7.1) and gcc (typically 4.8.1 but sometimes later).

One area where I bumped into was dead code elimination in the presence of
complex control flow. I note that the “aggressive dead code elimination”
(ADCE.cpp) treats all branch operations as live (isa<TerminatorInst>(I)). 
Doing more requires some approximation to control dependence.

I note SimplifyCFG indirectly handles some simple cases. It will speculate the
contents of a basic block into a predecessor but this is insufficient for more
complex structures. This probably cherry-picks the most common cases by
frequency.

Have their been prior attempts strengthen dead code elimination w.r.t. control
flow? If so, any guidance on what went wrong?

Thanks
David

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=pCcZikgFQttaHaETuHc6G00dgArj_Spf58imKkXlTqk&s=NTh5Q1gE2ANS1rQYN9XFok_t8wvWCu1dzzzvHfv3hlI&e=>

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=7nHK35pXeFc5plcmZyxIPJwB8O8ZyHCJ4P89mN8BoZQ&s=BIA3X52dktMwTkOdClno4EdVyZHbON1Yd_OB0ZehhB0&e=>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=CwMF-g&c=5VD0RTtNlTh3ycd41b3MUw&r=lFyiPUrFdOHdaobP7i4hoA&m=bkSDsLc-hm2xvKwwoqqSo8EzOLlbhIiC39Y7SBxZk9E&s=ihX4nALn0HVEVOWE9bAORQBaADv1-LWu9NpDaHJSqv4&e=>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160202/93e5c57e/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2016 - DCE in the presence of control flow.

[llvm-dev] DCE in the presence of control flow.

[llvm-dev] DCE in the presence of control flow.

[llvm-dev] DCE in the presence of control flow.

[llvm-dev] DCE in the presence of control flow.

[llvm-dev] DCE in the presence of control flow.

Possibly Parallel Threads