thr3ads.net - llvm dev - [llvm-dev] RFC: New support for triaging optimization-related failures in front ends [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Kaylor, Andrew via llvm-dev

2016-Mar-25 19:13 UTC

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Hi Mehdi,

I started by trying to implement a single number approach, but it doesn't
allow quite the approach I wanted in terms of isolating individual functions. 
Also, I think that there would be problems with that once parallel function
optimization is enabled.

I am open to revisiting that approach to see if the problems I had with it can
be resolved.

-Andy

From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Friday, March 25, 2016 11:57 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends

Hi Andy,

On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

The Intel C++ compiler has long had an internal facility to help our development
teams track down the source of optimization related bugs by allowing a sort of
binary search in which optimization passes and even individual optimizations are
selectively disabled in response to front end command line options.  As
we've been doing development work on LLVM our developers have found that we
miss this capability and so we would like to introduce something like this to
the LLVM code base.

I am aware of the many existing facilities in LLVM for tracking down bugs such
as llc, opt and bugpoint.  Our development teams are, of course, making use of
these facilities, but we believe that the feature I am going to propose today
can powerfully complement these existing tools and provide a better way to
automate defect finding in products that provide a front end for LLVM.  This can
also be used by customers (who may not be able to share their source code) to
narrow down problems and provide a concise reproducer.

While the combination of llc, opt and bugpoint is very effective at locating
compilation failures that can be reproduced from an LLVM IR or bitcode file,
they can be cumbersome and difficult to automate, particularly when debugging
runtime failures in programs with non-trivial build systems.  The new feature
that I am proposing provides a way to selectively turn off ranges of
optimizations directly from a set of front end command line options.

The proposed feature works by assigning arbitrary but repeatable values to
modules, functions, passes, etc. as a compilation unit is being optimized.  The
developer can then use these numbers to selectively disable parts of the
optimization without otherwise altering the behavior of the compiler.

In keeping with the current LLVM pass manager structure, I am handling module
passes, SCC passes and function passes separately.  I propose handling loop,
region and basic block passes as if they were function passes.  (I think what
this means will become clear below.)  Because the handling of function passes
illustrates well all of the steps involved in using the feature I am proposing,
I'll start by describing that case in detail.  I'm describing this as a
series of manual steps, but I intend that these steps could be easily automated.

I propose introducing three command line options to support locating problems
with function pass optimizations.  I envision these as options available through
the front end but implemented in the core LLVM library.

-num-fn
  sets an upper limit on the function to optimize
  -1 enables all and displays the numbering

-num-fn-pass
  sets an upper limit on the function pass to run on the limit function
specified by -num-fn
  -1 enables all and displays the numbering
  ignored if -num-fn is not used or is -1
  all passes are run on functions below the limit function
  only necessary and analysis passes are run on functions above the limit
function

-num-fn-case
  sets an upper limit on the optimization case to apply within the limit
function pass specified by -num-fn-pass
  -1 enables all and displays the numbering
  ignored if -num-fn-pass is not used or is -1
  all cases are applied in passes below the limit function pass
  no cases are applied in optional passes below the limit function pass

As an example, a developer searching for function pass related optimization
problems (assuming a case which fails when compiled with -O2 but passes when
compiled with -O0) would begin by using the '-num-fn=-1' option to see
the functioning numbering.

clang -c -O2 -num-fn=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
Optimizing function (8) clean_up

The developer would then use a binary search, recompiling with selective
optimization and re-running the test case to determine the function in which the
problem occurs.

clang -c -O2 -num-fn=4 test.c
<test passes>
clang -c -O2 -num-fn=6 test.c
<test passes>
clang -c -O2 -num-fn=7 test.c
<test fails>

Having found the problem function, the developer would use the
'-num-fn-pass=-1' option to see the numbering of function passes
(including loop, region and basic block passes) that are run on that function.

clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
running necessary pass Function Pass Manager on function (7) matches_S
running necessary pass Module Verifier on function (7) matches_S
running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
running pass (1) Simplify the CFG on function (7) matches_S
running analysis pass Dominator Tree Construction on function (7) matches_S
running pass (2) SROA on function (7) matches_S
running pass (3) Early CSE on function (7) matches_S
running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
<additional passes ignored for brevity>
NOT Optimizing function (8) clean_up

The user would again use a binary search to determine the problem pass.

clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
<test passes>
clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
<test fails>

Having determined the problem pass, the developer would use the
'-num-fn-case=-1' option to see the numbering of individual
optimizations applied by the pass.

clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
running necessary pass Function Pass Manager on function (7) matches_S
running necessary pass Module Verifier on function (7) matches_S
running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
running pass (1) Simplify the CFG on function (7) matches_S
running analysis pass Dominator Tree Construction on function (7) matches_S
running pass (2) SROA on function (7) matches_S
running pass (3) Early CSE on function (7) matches_S
Another case (1): EarlyCSE CSE value
Another case (2): EarlyCSE CSE value
Another case (3): EarlyCSE CSE value
Another case (4): EarlyCSE CSE load
Another case (5): EarlyCSE CSE value
Another case (6): EarlyCSE CSE load
Another case (7): EarlyCSE CSE value
Another case (8): EarlyCSE CSE load
NOT running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
<additional passes ignored for brevity>
NOT Optimizing function (8) clean_up

Finally, the developer would use one last binary search to determine which
optimization case was causing the failure.

clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
<test fails>
clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
<test passes>
clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
<test fails>

Most of the above functionality can be implemented by inserting hooks into the
various pass managers.  Support for the '-num-fn-case' option would
require instrumentation of individual passes.  I propose implementing this on an
'opt in' basis so that it can be added to passes as needed and passes
which do not opt in will simply not report having selectable cases.

Note that I have introduced the concept of a "necessary" pass. 
Because the goal of this is to have the front end produce the same sort of
object or executable file that it normally would, some passes (such as the
register allocator and its various helper passes) cannot be skipped.  I am also,
for now, proposing that all analysis passes be run always to avoid the problem
of determining which analysis passes would be required for later
"necessary" passes.  We can revisit this as needed.

As I've said, I intend to handle loop, region and basic block passes as if
they were just another function pass.  While I would like to display an
associated number for the target loop, region or block, I don't see value in
being able to filter passes based on this value (because they are relatively
dynamic constructs), so consecutive runs of a single loop pass (for instance) on
different loops would be identified as distinct function passes.  For example:

running pass (25) Loop Invariant Code Motion on loop (1), function (7) matches_S
running pass (26) Loop Invariant Code Motion on loop (2), function (7) matches_S
running pass (27) Loop Invariant Code Motion on loop (3), function (7) matches_S

I would also like to have a feature that would allow the developer to request
that an LLVM IR/bitcode file be emitted just before the first disabled pass so
that the isolated IR could be used with opt, llc or bugpoint for further
debugging.  I haven't worked out exactly how this would work, so I'm
just mentioning it here as a fuzzy part of the proposal.

Circling back now to module and SCC passes, these would work in pretty much the
same way as the above method for function passes so I won't cover them in as
much detail.  I propose the following new command line options.

-num-mod
  sets an upper limit on the module on which to run module passes
  -1 enables all and displays the numbering
  defaults to 1 since I expect most front ends to usually operate on a single
module

-num-mod-pass
  sets an upper limit on the module pass to run on the limit module specified by
-num-mod
  -1 enables all and displays the numbering
  ignored if -num-mod is -1
  all passes are run on modules below the limit module
  only necessary and analysis passes are run on modules above the limit module

-num-mod-case
  sets an upper limit on the optimization case to apply within the limit module
pass specified by -num-mod-pass
  -1 enables all and displays the numbering
  ignored if -num-mod-pass is not used or is -1
  all cases are applied in passes below the limit module pass
  no cases are applied in optional passes below the limit module pass

-num-scc-pass
  sets an upper limit on the SCC pass to run
  -1 enables all and displays the numbering

-num-scc-case
  sets an upper limit on the optimization case to apply within the limit SCC
pass specified by -num-scc-pass
  -1 enables all and displays the numbering
  ignored if -num-scc-pass is not used or is -1
  all cases are applied in passes below the limit SCC pass
  no cases are applied in optional passes below the limit SCC pass

Similar to loops, regions and basic blocks, I would like to present numbering
information for each unique SCC, but I do not believe there is value in being
able to filter on a specific SCC because they are subject to reorganization. 
Rather I would treat each invocation of an SCC pass as a separate selectable
pass.  The output for an SCC problem search would look something like this:

Optimizing SCC (1): <null function>
running pass (1) Remove unused exception handling info on SCC (1)
running pass (2) Function Integration/Inlining on SCC (1)
running pass (3) Deduce function attributes on SCC (1)
Optimizing SCC (2): isupper
running pass (4) Remove unused exception handling info on SCC (2)
running pass (5) Function Integration/Inlining on SCC (2)
running pass (6) Deduce function attributes on SCC (2)
<snip>
Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
running pass (101) Remove unused exception handling info on SCC (34)
running pass (102) Function Integration/Inlining on SCC (34)
running pass (103) Deduce function attributes on SCC (34)
<snip>

Here I'm printing the functions (if any) associated with nodes in the SCC
just to give the developer some feedback on which part of the code is being
worked on.

I have a working prototype of all of the above functionality using both the new
pass manager and the legacy pass manager.

So what do you think?  Do you agree that support for a feature of this sort
would be useful in the LLVM core library?  What problems do you see in my
proposal?  What additional options would you like to see added?

Thanks in advance for any feedback.


You're right that we could benefit from support to help
"bisecting" where a miscompile happens in a more automated way.

Reading through you (long but nice) description, I thought about something
simpler: a single number incremented every time a pass in the optimizer is ran
(as the order shown by -debug-pass=Executions).
That way you run once to get the total number of passes that ran and then, a
simple bisect script can find quite quickly at which point the miscompile is
introduced (between 0 and max).
I haven't thought about it much but I'm interested in you view of the
pros/cons.

--
Mehdi


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/21e15d9e/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Mar-25 19:37 UTC

head link

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at
intel.com> wrote:
> 
> Hi Mehdi,
>  
> I started by trying to implement a single number approach, but it doesn’t
allow quite the approach I wanted in terms of isolating individual functions.
Do you have an example where it matters? I wonder if there are enough real-world
use case that justify the complexity?

> Also, I think that there would be problems with that once parallel function
optimization is enabled.
Honestly this is not going to happen in years. And even if we had it, we're
talking about debugging and we could turn off the parallelism for that.

-- 
Mehdi

>  
> I am open to revisiting that approach to see if the problems I had with it
can be resolved.
>   <>
> -Andy
>  
>  <>From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com] 
> Sent: Friday, March 25, 2016 11:57 AM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends
>  
> Hi Andy,
>  
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> The Intel C++ compiler has long had an internal facility to help our
development teams track down the source of optimization related bugs by allowing
a sort of binary search in which optimization passes and even individual
optimizations are selectively disabled in response to front end command line
options.  As we've been doing development work on LLVM our developers have
found that we miss this capability and so we would like to introduce something
like this to the LLVM code base.
>  
> I am aware of the many existing facilities in LLVM for tracking down bugs
such as llc, opt and bugpoint.  Our development teams are, of course, making use
of these facilities, but we believe that the feature I am going to propose today
can powerfully complement these existing tools and provide a better way to
automate defect finding in products that provide a front end for LLVM.  This can
also be used by customers (who may not be able to share their source code) to
narrow down problems and provide a concise reproducer.
>  
> While the combination of llc, opt and bugpoint is very effective at
locating compilation failures that can be reproduced from an LLVM IR or bitcode
file, they can be cumbersome and difficult to automate, particularly when
debugging runtime failures in programs with non-trivial build systems.  The new
feature that I am proposing provides a way to selectively turn off ranges of
optimizations directly from a set of front end command line options.
>  
> The proposed feature works by assigning arbitrary but repeatable values to
modules, functions, passes, etc. as a compilation unit is being optimized.  The
developer can then use these numbers to selectively disable parts of the
optimization without otherwise altering the behavior of the compiler.
>  
> In keeping with the current LLVM pass manager structure, I am handling
module passes, SCC passes and function passes separately.  I propose handling
loop, region and basic block passes as if they were function passes.  (I think
what this means will become clear below.)  Because the handling of function
passes illustrates well all of the steps involved in using the feature I am
proposing, I'll start by describing that case in detail.  I'm describing
this as a series of manual steps, but I intend that these steps could be easily
automated.
>  
> I propose introducing three command line options to support locating
problems with function pass optimizations.  I envision these as options
available through the front end but implemented in the core LLVM library.
>  
> -num-fn
>   sets an upper limit on the function to optimize
>   -1 enables all and displays the numbering
>  
> -num-fn-pass
>   sets an upper limit on the function pass to run on the limit function
specified by -num-fn
>   -1 enables all and displays the numbering
>   ignored if -num-fn is not used or is -1
>   all passes are run on functions below the limit function
>   only necessary and analysis passes are run on functions above the limit
function
>  
> -num-fn-case
>   sets an upper limit on the optimization case to apply within the limit
function pass specified by -num-fn-pass
>   -1 enables all and displays the numbering
>   ignored if -num-fn-pass is not used or is -1
>   all cases are applied in passes below the limit function pass
>   no cases are applied in optional passes below the limit function pass
>  
> As an example, a developer searching for function pass related optimization
problems (assuming a case which fails when compiled with -O2 but passes when
compiled with -O0) would begin by using the '-num-fn=-1' option to see
the functioning numbering.
>  
> clang -c -O2 -num-fn=-1 test.c
>  
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> Optimizing function (8) clean_up
>  
> The developer would then use a binary search, recompiling with selective
optimization and re-running the test case to determine the function in which the
problem occurs.
>  
> clang -c -O2 -num-fn=4 test.c
> <test passes>
> clang -c -O2 -num-fn=6 test.c
> <test passes>
> clang -c -O2 -num-fn=7 test.c
> <test fails>
>  
> Having found the problem function, the developer would use the
'-num-fn-pass=-1' option to see the numbering of function passes
(including loop, region and basic block passes) that are run on that function.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>  
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>  
> The user would again use a binary search to determine the problem pass.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
> <test fails>
>  
> Having determined the problem pass, the developer would use the
'-num-fn-case=-1' option to see the numbering of individual
optimizations applied by the pass.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>  
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> Another case (1): EarlyCSE CSE value
> Another case (2): EarlyCSE CSE value
> Another case (3): EarlyCSE CSE value
> Another case (4): EarlyCSE CSE load
> Another case (5): EarlyCSE CSE value
> Another case (6): EarlyCSE CSE load
> Another case (7): EarlyCSE CSE value
> Another case (8): EarlyCSE CSE load
> NOT running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>  
> Finally, the developer would use one last binary search to determine which
optimization case was causing the failure.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
> <test fails>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
> <test fails>
>  
> Most of the above functionality can be implemented by inserting hooks into
the various pass managers.  Support for the '-num-fn-case' option would
require instrumentation of individual passes.  I propose implementing this on an
'opt in' basis so that it can be added to passes as needed and passes
which do not opt in will simply not report having selectable cases.
>  
> Note that I have introduced the concept of a "necessary" pass. 
Because the goal of this is to have the front end produce the same sort of
object or executable file that it normally would, some passes (such as the
register allocator and its various helper passes) cannot be skipped.  I am also,
for now, proposing that all analysis passes be run always to avoid the problem
of determining which analysis passes would be required for later
"necessary" passes.  We can revisit this as needed.
>  
> As I've said, I intend to handle loop, region and basic block passes as
if they were just another function pass.  While I would like to display an
associated number for the target loop, region or block, I don't see value in
being able to filter passes based on this value (because they are relatively
dynamic constructs), so consecutive runs of a single loop pass (for instance) on
different loops would be identified as distinct function passes.  For example:
>  
> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
matches_S
> running pass (26) Loop Invariant Code Motion on loop (2), function (7)
matches_S
> running pass (27) Loop Invariant Code Motion on loop (3), function (7)
matches_S
>  
> I would also like to have a feature that would allow the developer to
request that an LLVM IR/bitcode file be emitted just before the first disabled
pass so that the isolated IR could be used with opt, llc or bugpoint for further
debugging.  I haven't worked out exactly how this would work, so I'm
just mentioning it here as a fuzzy part of the proposal.
>  
> Circling back now to module and SCC passes, these would work in pretty much
the same way as the above method for function passes so I won't cover them
in as much detail.  I propose the following new command line options.
>  
> -num-mod
>   sets an upper limit on the module on which to run module passes
>   -1 enables all and displays the numbering
>   defaults to 1 since I expect most front ends to usually operate on a
single module
>  
> -num-mod-pass
>   sets an upper limit on the module pass to run on the limit module
specified by -num-mod
>   -1 enables all and displays the numbering
>   ignored if -num-mod is -1
>   all passes are run on modules below the limit module
>   only necessary and analysis passes are run on modules above the limit
module
>  
> -num-mod-case
>   sets an upper limit on the optimization case to apply within the limit
module pass specified by -num-mod-pass
>   -1 enables all and displays the numbering
>   ignored if -num-mod-pass is not used or is -1
>   all cases are applied in passes below the limit module pass
>   no cases are applied in optional passes below the limit module pass
>  
> -num-scc-pass
>   sets an upper limit on the SCC pass to run
>   -1 enables all and displays the numbering
>  
> -num-scc-case
>   sets an upper limit on the optimization case to apply within the limit
SCC pass specified by -num-scc-pass
>   -1 enables all and displays the numbering
>   ignored if -num-scc-pass is not used or is -1
>   all cases are applied in passes below the limit SCC pass
>   no cases are applied in optional passes below the limit SCC pass
>  
> Similar to loops, regions and basic blocks, I would like to present
numbering information for each unique SCC, but I do not believe there is value
in being able to filter on a specific SCC because they are subject to
reorganization.  Rather I would treat each invocation of an SCC pass as a
separate selectable pass.  The output for an SCC problem search would look
something like this:
>  
> Optimizing SCC (1): <null function>
> running pass (1) Remove unused exception handling info on SCC (1)
> running pass (2) Function Integration/Inlining on SCC (1)
> running pass (3) Deduce function attributes on SCC (1)
> Optimizing SCC (2): isupper
> running pass (4) Remove unused exception handling info on SCC (2)
> running pass (5) Function Integration/Inlining on SCC (2)
> running pass (6) Deduce function attributes on SCC (2)
> <snip>
> Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
> running pass (101) Remove unused exception handling info on SCC (34)
> running pass (102) Function Integration/Inlining on SCC (34)
> running pass (103) Deduce function attributes on SCC (34)
> <snip>
>  
> Here I'm printing the functions (if any) associated with nodes in the
SCC just to give the developer some feedback on which part of the code is being
worked on.
>  
> I have a working prototype of all of the above functionality using both the
new pass manager and the legacy pass manager.
>  
> So what do you think?  Do you agree that support for a feature of this sort
would be useful in the LLVM core library?  What problems do you see in my
proposal?  What additional options would you like to see added?
>  
> Thanks in advance for any feedback.
>  
>  
> You're right that we could benefit from support to help
"bisecting" where a miscompile happens in a more automated way.
>  
> Reading through you (long but nice) description, I thought about something
simpler: a single number incremented every time a pass in the optimizer is ran
(as the order shown by -debug-pass=Executions).
> That way you run once to get the total number of passes that ran and then,
a simple bisect script can find quite quickly at which point the miscompile is
introduced (between 0 and max).
> I haven't thought about it much but I'm interested in you view of
the pros/cons.
>  
> -- 
> Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/dfce2d5d/attachment-0001.html>

Sanjoy Das via llvm-dev

2016-Mar-25 19:39 UTC

head link

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Maybe someone on the list who knows about GHC internals (I don't) can
chime in here, but GHC has (had?) a concept of "optimization fuel":
http://blog.ezyang.com/2011/06/debugging-compilers-with-optimization-fuel/
that is very similar to what Mehdi is suggesting.

-- Sanjoy


On Fri, Mar 25, 2016 at 12:37 PM, Mehdi Amini via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at
intel.com>
> wrote:
>
> Hi Mehdi,
>
> I started by trying to implement a single number approach, but it doesn’t
> allow quite the approach I wanted in terms of isolating individual
> functions.
>
>
> Do you have an example where it matters? I wonder if there are enough
> real-world use case that justify the complexity?
>
>
> Also, I think that there would be problems with that once parallel function
> optimization is enabled.
>
>
> Honestly this is not going to happen in years. And even if we had it,
we're
> talking about debugging and we could turn off the parallelism for that.
>
> --
> Mehdi
>
>
>
> I am open to revisiting that approach to see if the problems I had with it
> can be resolved.
>
> -Andy
>
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Friday, March 25, 2016 11:57 AM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
> failures in front ends
>
> Hi Andy,
>
>
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> The Intel C++ compiler has long had an internal facility to help our
> development teams track down the source of optimization related bugs by
> allowing a sort of binary search in which optimization passes and even
> individual optimizations are selectively disabled in response to front end
> command line options.  As we've been doing development work on LLVM our
> developers have found that we miss this capability and so we would like to
> introduce something like this to the LLVM code base.
>
> I am aware of the many existing facilities in LLVM for tracking down bugs
> such as llc, opt and bugpoint.  Our development teams are, of course,
making
> use of these facilities, but we believe that the feature I am going to
> propose today can powerfully complement these existing tools and provide a
> better way to automate defect finding in products that provide a front end
> for LLVM.  This can also be used by customers (who may not be able to share
> their source code) to narrow down problems and provide a concise
reproducer.
>
> While the combination of llc, opt and bugpoint is very effective at
locating
> compilation failures that can be reproduced from an LLVM IR or bitcode
file,
> they can be cumbersome and difficult to automate, particularly when
> debugging runtime failures in programs with non-trivial build systems.  The
> new feature that I am proposing provides a way to selectively turn off
> ranges of optimizations directly from a set of front end command line
> options.
>
> The proposed feature works by assigning arbitrary but repeatable values to
> modules, functions, passes, etc. as a compilation unit is being optimized.
> The developer can then use these numbers to selectively disable parts of
the
> optimization without otherwise altering the behavior of the compiler.
>
> In keeping with the current LLVM pass manager structure, I am handling
> module passes, SCC passes and function passes separately.  I propose
> handling loop, region and basic block passes as if they were function
> passes.  (I think what this means will become clear below.)  Because the
> handling of function passes illustrates well all of the steps involved in
> using the feature I am proposing, I'll start by describing that case in
> detail.  I'm describing this as a series of manual steps, but I intend
that
> these steps could be easily automated.
>
> I propose introducing three command line options to support locating
> problems with function pass optimizations.  I envision these as options
> available through the front end but implemented in the core LLVM library.
>
> -num-fn
>   sets an upper limit on the function to optimize
>   -1 enables all and displays the numbering
>
> -num-fn-pass
>   sets an upper limit on the function pass to run on the limit function
> specified by -num-fn
>   -1 enables all and displays the numbering
>   ignored if -num-fn is not used or is -1
>   all passes are run on functions below the limit function
>   only necessary and analysis passes are run on functions above the limit
> function
>
> -num-fn-case
>   sets an upper limit on the optimization case to apply within the limit
> function pass specified by -num-fn-pass
>   -1 enables all and displays the numbering
>   ignored if -num-fn-pass is not used or is -1
>   all cases are applied in passes below the limit function pass
>   no cases are applied in optional passes below the limit function pass
>
> As an example, a developer searching for function pass related optimization
> problems (assuming a case which fails when compiled with -O2 but passes
when
> compiled with -O0) would begin by using the '-num-fn=-1' option to
see the
> functioning numbering.
>
> clang -c -O2 -num-fn=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> Optimizing function (8) clean_up
>
> The developer would then use a binary search, recompiling with selective
> optimization and re-running the test case to determine the function in
which
> the problem occurs.
>
> clang -c -O2 -num-fn=4 test.c
> <test passes>
> clang -c -O2 -num-fn=6 test.c
> <test passes>
> clang -c -O2 -num-fn=7 test.c
> <test fails>
>
> Having found the problem function, the developer would use the
> '-num-fn-pass=-1' option to see the numbering of function passes
(including
> loop, region and basic block passes) that are run on that function.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
> matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>
> The user would again use a binary search to determine the problem pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
> <test fails>
>
> Having determined the problem pass, the developer would use the
> '-num-fn-case=-1' option to see the numbering of individual
optimizations
> applied by the pass.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
> matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> Another case (1): EarlyCSE CSE value
> Another case (2): EarlyCSE CSE value
> Another case (3): EarlyCSE CSE value
> Another case (4): EarlyCSE CSE load
> Another case (5): EarlyCSE CSE value
> Another case (6): EarlyCSE CSE load
> Another case (7): EarlyCSE CSE value
> Another case (8): EarlyCSE CSE load
> NOT running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>
> Finally, the developer would use one last binary search to determine which
> optimization case was causing the failure.
>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
> <test fails>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
> <test fails>
>
> Most of the above functionality can be implemented by inserting hooks into
> the various pass managers.  Support for the '-num-fn-case' option
would
> require instrumentation of individual passes.  I propose implementing this
> on an 'opt in' basis so that it can be added to passes as needed
and passes
> which do not opt in will simply not report having selectable cases.
>
> Note that I have introduced the concept of a "necessary" pass. 
Because the
> goal of this is to have the front end produce the same sort of object or
> executable file that it normally would, some passes (such as the register
> allocator and its various helper passes) cannot be skipped.  I am also, for
> now, proposing that all analysis passes be run always to avoid the problem
> of determining which analysis passes would be required for later
"necessary"
> passes.  We can revisit this as needed.
>
> As I've said, I intend to handle loop, region and basic block passes as
if
> they were just another function pass.  While I would like to display an
> associated number for the target loop, region or block, I don't see
value in
> being able to filter passes based on this value (because they are
relatively
> dynamic constructs), so consecutive runs of a single loop pass (for
> instance) on different loops would be identified as distinct function
> passes.  For example:
>
> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
> matches_S
> running pass (26) Loop Invariant Code Motion on loop (2), function (7)
> matches_S
> running pass (27) Loop Invariant Code Motion on loop (3), function (7)
> matches_S
>
> I would also like to have a feature that would allow the developer to
> request that an LLVM IR/bitcode file be emitted just before the first
> disabled pass so that the isolated IR could be used with opt, llc or
> bugpoint for further debugging.  I haven't worked out exactly how this
would
> work, so I'm just mentioning it here as a fuzzy part of the proposal.
>
> Circling back now to module and SCC passes, these would work in pretty much
> the same way as the above method for function passes so I won't cover
them
> in as much detail.  I propose the following new command line options.
>
> -num-mod
>   sets an upper limit on the module on which to run module passes
>   -1 enables all and displays the numbering
>   defaults to 1 since I expect most front ends to usually operate on a
> single module
>
> -num-mod-pass
>   sets an upper limit on the module pass to run on the limit module
> specified by -num-mod
>   -1 enables all and displays the numbering
>   ignored if -num-mod is -1
>   all passes are run on modules below the limit module
>   only necessary and analysis passes are run on modules above the limit
> module
>
> -num-mod-case
>   sets an upper limit on the optimization case to apply within the limit
> module pass specified by -num-mod-pass
>   -1 enables all and displays the numbering
>   ignored if -num-mod-pass is not used or is -1
>   all cases are applied in passes below the limit module pass
>   no cases are applied in optional passes below the limit module pass
>
> -num-scc-pass
>   sets an upper limit on the SCC pass to run
>   -1 enables all and displays the numbering
>
> -num-scc-case
>   sets an upper limit on the optimization case to apply within the limit
SCC
> pass specified by -num-scc-pass
>   -1 enables all and displays the numbering
>   ignored if -num-scc-pass is not used or is -1
>   all cases are applied in passes below the limit SCC pass
>   no cases are applied in optional passes below the limit SCC pass
>
> Similar to loops, regions and basic blocks, I would like to present
> numbering information for each unique SCC, but I do not believe there is
> value in being able to filter on a specific SCC because they are subject to
> reorganization.  Rather I would treat each invocation of an SCC pass as a
> separate selectable pass.  The output for an SCC problem search would look
> something like this:
>
> Optimizing SCC (1): <null function>
> running pass (1) Remove unused exception handling info on SCC (1)
> running pass (2) Function Integration/Inlining on SCC (1)
> running pass (3) Deduce function attributes on SCC (1)
> Optimizing SCC (2): isupper
> running pass (4) Remove unused exception handling info on SCC (2)
> running pass (5) Function Integration/Inlining on SCC (2)
> running pass (6) Deduce function attributes on SCC (2)
> <snip>
> Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
> running pass (101) Remove unused exception handling info on SCC (34)
> running pass (102) Function Integration/Inlining on SCC (34)
> running pass (103) Deduce function attributes on SCC (34)
> <snip>
>
> Here I'm printing the functions (if any) associated with nodes in the
SCC
> just to give the developer some feedback on which part of the code is being
> worked on.
>
> I have a working prototype of all of the above functionality using both the
> new pass manager and the legacy pass manager.
>
> So what do you think?  Do you agree that support for a feature of this sort
> would be useful in the LLVM core library?  What problems do you see in my
> proposal?  What additional options would you like to see added?
>
> Thanks in advance for any feedback.
>
>
>
> You're right that we could benefit from support to help
"bisecting" where a
> miscompile happens in a more automated way.
>
> Reading through you (long but nice) description, I thought about something
> simpler: a single number incremented every time a pass in the optimizer is
> ran (as the order shown by -debug-pass=Executions).
> That way you run once to get the total number of passes that ran and then,
a
> simple bisect script can find quite quickly at which point the miscompile
is
> introduced (between 0 and max).
> I haven't thought about it much but I'm interested in you view of
the
> pros/cons.
>
> --
> Mehdi
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


-- 
Sanjoy Das
http://playingwithpointers.com

Robinson, Paul via llvm-dev

2016-Mar-25 19:42 UTC

head link

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

+1 to this general idea.  I used the Intel facility in a previous job where we
had licensed their compiler, and it was fantastically helpful.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Mehdi
Amini via llvm-dev
Sent: Friday, March 25, 2016 12:37 PM
To: Kaylor, Andrew
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends

On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at
intel.com<mailto:andrew.kaylor at intel.com>> wrote:

Hi Mehdi,

I started by trying to implement a single number approach, but it doesn’t allow
quite the approach I wanted in terms of isolating individual functions.

Do you have an example where it matters? I wonder if there are enough real-world
use case that justify the complexity?

Also, I think that there would be problems with that once parallel function
optimization is enabled.

Honestly this is not going to happen in years. And even if we had it, we're
talking about debugging and we could turn off the parallelism for that.

--
Mehdi

I am open to revisiting that approach to see if the problems I had with it can
be resolved.

-Andy

From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com]
Sent: Friday, March 25, 2016 11:57 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends

Hi Andy,

On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

The Intel C++ compiler has long had an internal facility to help our development
teams track down the source of optimization related bugs by allowing a sort of
binary search in which optimization passes and even individual optimizations are
selectively disabled in response to front end command line options.  As
we've been doing development work on LLVM our developers have found that we
miss this capability and so we would like to introduce something like this to
the LLVM code base.

I am aware of the many existing facilities in LLVM for tracking down bugs such
as llc, opt and bugpoint.  Our development teams are, of course, making use of
these facilities, but we believe that the feature I am going to propose today
can powerfully complement these existing tools and provide a better way to
automate defect finding in products that provide a front end for LLVM.  This can
also be used by customers (who may not be able to share their source code) to
narrow down problems and provide a concise reproducer.

While the combination of llc, opt and bugpoint is very effective at locating
compilation failures that can be reproduced from an LLVM IR or bitcode file,
they can be cumbersome and difficult to automate, particularly when debugging
runtime failures in programs with non-trivial build systems.  The new feature
that I am proposing provides a way to selectively turn off ranges of
optimizations directly from a set of front end command line options.

The proposed feature works by assigning arbitrary but repeatable values to
modules, functions, passes, etc. as a compilation unit is being optimized.  The
developer can then use these numbers to selectively disable parts of the
optimization without otherwise altering the behavior of the compiler.

In keeping with the current LLVM pass manager structure, I am handling module
passes, SCC passes and function passes separately.  I propose handling loop,
region and basic block passes as if they were function passes.  (I think what
this means will become clear below.)  Because the handling of function passes
illustrates well all of the steps involved in using the feature I am proposing,
I'll start by describing that case in detail.  I'm describing this as a
series of manual steps, but I intend that these steps could be easily automated.

I propose introducing three command line options to support locating problems
with function pass optimizations.  I envision these as options available through
the front end but implemented in the core LLVM library.

-num-fn
  sets an upper limit on the function to optimize
  -1 enables all and displays the numbering

-num-fn-pass
  sets an upper limit on the function pass to run on the limit function
specified by -num-fn
  -1 enables all and displays the numbering
  ignored if -num-fn is not used or is -1
  all passes are run on functions below the limit function
  only necessary and analysis passes are run on functions above the limit
function

-num-fn-case
  sets an upper limit on the optimization case to apply within the limit
function pass specified by -num-fn-pass
  -1 enables all and displays the numbering
  ignored if -num-fn-pass is not used or is -1
  all cases are applied in passes below the limit function pass
  no cases are applied in optional passes below the limit function pass

As an example, a developer searching for function pass related optimization
problems (assuming a case which fails when compiled with -O2 but passes when
compiled with -O0) would begin by using the '-num-fn=-1' option to see
the functioning numbering.

clang -c -O2 -num-fn=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
Optimizing function (8) clean_up

The developer would then use a binary search, recompiling with selective
optimization and re-running the test case to determine the function in which the
problem occurs.

clang -c -O2 -num-fn=4 test.c
<test passes>
clang -c -O2 -num-fn=6 test.c
<test passes>
clang -c -O2 -num-fn=7 test.c
<test fails>

Having found the problem function, the developer would use the
'-num-fn-pass=-1' option to see the numbering of function passes
(including loop, region and basic block passes) that are run on that function.

clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
running necessary pass Function Pass Manager on function (7) matches_S
running necessary pass Module Verifier on function (7) matches_S
running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
running pass (1) Simplify the CFG on function (7) matches_S
running analysis pass Dominator Tree Construction on function (7) matches_S
running pass (2) SROA on function (7) matches_S
running pass (3) Early CSE on function (7) matches_S
running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
<additional passes ignored for brevity>
NOT Optimizing function (8) clean_up

The user would again use a binary search to determine the problem pass.

clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
<test passes>
clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
<test fails>

Having determined the problem pass, the developer would use the
'-num-fn-case=-1' option to see the numbering of individual
optimizations applied by the pass.

clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
running necessary pass Function Pass Manager on function (7) matches_S
running necessary pass Module Verifier on function (7) matches_S
running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
running pass (1) Simplify the CFG on function (7) matches_S
running analysis pass Dominator Tree Construction on function (7) matches_S
running pass (2) SROA on function (7) matches_S
running pass (3) Early CSE on function (7) matches_S
Another case (1): EarlyCSE CSE value
Another case (2): EarlyCSE CSE value
Another case (3): EarlyCSE CSE value
Another case (4): EarlyCSE CSE load
Another case (5): EarlyCSE CSE value
Another case (6): EarlyCSE CSE load
Another case (7): EarlyCSE CSE value
Another case (8): EarlyCSE CSE load
NOT running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
<additional passes ignored for brevity>
NOT Optimizing function (8) clean_up

Finally, the developer would use one last binary search to determine which
optimization case was causing the failure.

clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
<test fails>
clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
<test passes>
clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
<test fails>

Most of the above functionality can be implemented by inserting hooks into the
various pass managers.  Support for the '-num-fn-case' option would
require instrumentation of individual passes.  I propose implementing this on an
'opt in' basis so that it can be added to passes as needed and passes
which do not opt in will simply not report having selectable cases.

Note that I have introduced the concept of a "necessary" pass. 
Because the goal of this is to have the front end produce the same sort of
object or executable file that it normally would, some passes (such as the
register allocator and its various helper passes) cannot be skipped.  I am also,
for now, proposing that all analysis passes be run always to avoid the problem
of determining which analysis passes would be required for later
"necessary" passes.  We can revisit this as needed.

As I've said, I intend to handle loop, region and basic block passes as if
they were just another function pass.  While I would like to display an
associated number for the target loop, region or block, I don't see value in
being able to filter passes based on this value (because they are relatively
dynamic constructs), so consecutive runs of a single loop pass (for instance) on
different loops would be identified as distinct function passes.  For example:

running pass (25) Loop Invariant Code Motion on loop (1), function (7) matches_S
running pass (26) Loop Invariant Code Motion on loop (2), function (7) matches_S
running pass (27) Loop Invariant Code Motion on loop (3), function (7) matches_S

I would also like to have a feature that would allow the developer to request
that an LLVM IR/bitcode file be emitted just before the first disabled pass so
that the isolated IR could be used with opt, llc or bugpoint for further
debugging.  I haven't worked out exactly how this would work, so I'm
just mentioning it here as a fuzzy part of the proposal.

Circling back now to module and SCC passes, these would work in pretty much the
same way as the above method for function passes so I won't cover them in as
much detail.  I propose the following new command line options.

-num-mod
  sets an upper limit on the module on which to run module passes
  -1 enables all and displays the numbering
  defaults to 1 since I expect most front ends to usually operate on a single
module

-num-mod-pass
  sets an upper limit on the module pass to run on the limit module specified by
-num-mod
  -1 enables all and displays the numbering
  ignored if -num-mod is -1
  all passes are run on modules below the limit module
  only necessary and analysis passes are run on modules above the limit module

-num-mod-case
  sets an upper limit on the optimization case to apply within the limit module
pass specified by -num-mod-pass
  -1 enables all and displays the numbering
  ignored if -num-mod-pass is not used or is -1
  all cases are applied in passes below the limit module pass
  no cases are applied in optional passes below the limit module pass

-num-scc-pass
  sets an upper limit on the SCC pass to run
  -1 enables all and displays the numbering

-num-scc-case
  sets an upper limit on the optimization case to apply within the limit SCC
pass specified by -num-scc-pass
  -1 enables all and displays the numbering
  ignored if -num-scc-pass is not used or is -1
  all cases are applied in passes below the limit SCC pass
  no cases are applied in optional passes below the limit SCC pass

Similar to loops, regions and basic blocks, I would like to present numbering
information for each unique SCC, but I do not believe there is value in being
able to filter on a specific SCC because they are subject to reorganization. 
Rather I would treat each invocation of an SCC pass as a separate selectable
pass.  The output for an SCC problem search would look something like this:

Optimizing SCC (1): <null function>
running pass (1) Remove unused exception handling info on SCC (1)
running pass (2) Function Integration/Inlining on SCC (1)
running pass (3) Deduce function attributes on SCC (1)
Optimizing SCC (2): isupper
running pass (4) Remove unused exception handling info on SCC (2)
running pass (5) Function Integration/Inlining on SCC (2)
running pass (6) Deduce function attributes on SCC (2)
<snip>
Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
running pass (101) Remove unused exception handling info on SCC (34)
running pass (102) Function Integration/Inlining on SCC (34)
running pass (103) Deduce function attributes on SCC (34)
<snip>

Here I'm printing the functions (if any) associated with nodes in the SCC
just to give the developer some feedback on which part of the code is being
worked on.

I have a working prototype of all of the above functionality using both the new
pass manager and the legacy pass manager.

So what do you think?  Do you agree that support for a feature of this sort
would be useful in the LLVM core library?  What problems do you see in my
proposal?  What additional options would you like to see added?

Thanks in advance for any feedback.

You're right that we could benefit from support to help
"bisecting" where a miscompile happens in a more automated way.

Reading through you (long but nice) description, I thought about something
simpler: a single number incremented every time a pass in the optimizer is ran
(as the order shown by -debug-pass=Executions).
That way you run once to get the total number of passes that ran and then, a
simple bisect script can find quite quickly at which point the miscompile is
introduced (between 0 and max).
I haven't thought about it much but I'm interested in you view of the
pros/cons.

--
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/422b1133/attachment-0001.html>

Kaylor, Andrew via llvm-dev

2016-Mar-25 20:10 UTC

head link

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

> Do you have an example where it matters? I wonder if there are enough
real-world use case that justify the complexity?
That’s a fair question.

The implementation I proposed maps pretty closely to the kinds of distinction
that the pass managers make and the way they group the passes so it felt natural
to me, but it may not be strictly necessary and it certainly complicates the use
of the feature a bit.

At least in theory grouping all of the optimizations as applied to all of the
functions in a module has the potential to greatly increase the number of
iterations in the search, but given the way in which the optimization passes are
grouped for running I’m not sure that penalty would appear in the practice.

Thinking back on the work I did on this, I initially abandoned the single number
approach because I was trying to disable the initialization and finalization
calls to passes too, and that wasn’t working at all trying to interleave module
and function passes.  After further exploration I decided that it wasn’t worth
trying to manage initialization and finalization, at least in the initial
feature implementation, but if we ever wanted to bring that back it could cause
problems.

I would really like to be able to dump an IR file containing only the failing
function once that function is identified, but I suppose that could be worked
into a single number approach.

My main requirement for the numbering is that it needs to be repeatable at least
up to the point where we start disable optimizations.  We also need the
disabling of passes to have minimal impact on later “necessary” passes, so that
we can be sure the search will converge reliably.

Anyway, the single number approach would be a fairly simple thing to implement
with the prototype code I have, so I’ll give it another try and see how it
compares.
> Honestly [parallel compilation] is not going to happen in years. And even
if we had it, we're talking about debugging and we could turn off the
parallelism for that.
I suppose that’s true.  Some of the test cases I’ve work with take a really,
really long time to build, so I hate to give up any compile time enhancements
that might be available, but ultimately it is something we could do without
sacrificing functionality.  There is the possibility that this would make it
difficult to find any problems that were related to the parallelization, but
that will be a sticky situation anyway.

-Andy

From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Friday, March 25, 2016 12:37 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends

On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew <andrew.kaylor at
intel.com<mailto:andrew.kaylor at intel.com>> wrote:

Hi Mehdi,

I started by trying to implement a single number approach, but it doesn’t allow
quite the approach I wanted in terms of isolating individual functions.

Do you have an example where it matters? I wonder if there are enough real-world
use case that justify the complexity?

Also, I think that there would be problems with that once parallel function
optimization is enabled.

Honestly this is not going to happen in years. And even if we had it, we're
talking about debugging and we could turn off the parallelism for that.

--
Mehdi

I am open to revisiting that approach to see if the problems I had with it can
be resolved.

-Andy

From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com]
Sent: Friday, March 25, 2016 11:57 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends

Hi Andy,

On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

The Intel C++ compiler has long had an internal facility to help our development
teams track down the source of optimization related bugs by allowing a sort of
binary search in which optimization passes and even individual optimizations are
selectively disabled in response to front end command line options.  As
we've been doing development work on LLVM our developers have found that we
miss this capability and so we would like to introduce something like this to
the LLVM code base.

I am aware of the many existing facilities in LLVM for tracking down bugs such
as llc, opt and bugpoint.  Our development teams are, of course, making use of
these facilities, but we believe that the feature I am going to propose today
can powerfully complement these existing tools and provide a better way to
automate defect finding in products that provide a front end for LLVM.  This can
also be used by customers (who may not be able to share their source code) to
narrow down problems and provide a concise reproducer.

While the combination of llc, opt and bugpoint is very effective at locating
compilation failures that can be reproduced from an LLVM IR or bitcode file,
they can be cumbersome and difficult to automate, particularly when debugging
runtime failures in programs with non-trivial build systems.  The new feature
that I am proposing provides a way to selectively turn off ranges of
optimizations directly from a set of front end command line options.

The proposed feature works by assigning arbitrary but repeatable values to
modules, functions, passes, etc. as a compilation unit is being optimized.  The
developer can then use these numbers to selectively disable parts of the
optimization without otherwise altering the behavior of the compiler.

In keeping with the current LLVM pass manager structure, I am handling module
passes, SCC passes and function passes separately.  I propose handling loop,
region and basic block passes as if they were function passes.  (I think what
this means will become clear below.)  Because the handling of function passes
illustrates well all of the steps involved in using the feature I am proposing,
I'll start by describing that case in detail.  I'm describing this as a
series of manual steps, but I intend that these steps could be easily automated.

I propose introducing three command line options to support locating problems
with function pass optimizations.  I envision these as options available through
the front end but implemented in the core LLVM library.

-num-fn
  sets an upper limit on the function to optimize
  -1 enables all and displays the numbering

-num-fn-pass
  sets an upper limit on the function pass to run on the limit function
specified by -num-fn
  -1 enables all and displays the numbering
  ignored if -num-fn is not used or is -1
  all passes are run on functions below the limit function
  only necessary and analysis passes are run on functions above the limit
function

-num-fn-case
  sets an upper limit on the optimization case to apply within the limit
function pass specified by -num-fn-pass
  -1 enables all and displays the numbering
  ignored if -num-fn-pass is not used or is -1
  all cases are applied in passes below the limit function pass
  no cases are applied in optional passes below the limit function pass

As an example, a developer searching for function pass related optimization
problems (assuming a case which fails when compiled with -O2 but passes when
compiled with -O0) would begin by using the '-num-fn=-1' option to see
the functioning numbering.

clang -c -O2 -num-fn=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
Optimizing function (8) clean_up

The developer would then use a binary search, recompiling with selective
optimization and re-running the test case to determine the function in which the
problem occurs.

clang -c -O2 -num-fn=4 test.c
<test passes>
clang -c -O2 -num-fn=6 test.c
<test passes>
clang -c -O2 -num-fn=7 test.c
<test fails>

Having found the problem function, the developer would use the
'-num-fn-pass=-1' option to see the numbering of function passes
(including loop, region and basic block passes) that are run on that function.

clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
running necessary pass Function Pass Manager on function (7) matches_S
running necessary pass Module Verifier on function (7) matches_S
running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
running pass (1) Simplify the CFG on function (7) matches_S
running analysis pass Dominator Tree Construction on function (7) matches_S
running pass (2) SROA on function (7) matches_S
running pass (3) Early CSE on function (7) matches_S
running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
<additional passes ignored for brevity>
NOT Optimizing function (8) clean_up

The user would again use a binary search to determine the problem pass.

clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
<test passes>
clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
<test fails>

Having determined the problem pass, the developer would use the
'-num-fn-case=-1' option to see the numbering of individual
optimizations applied by the pass.

clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c

Optimizing function (1) prune_match
Optimizing function (2) free_S
Optimizing function (3) hash_S
Optimizing function (4) insert_S
Optimizing function (5) zero_S
Optimizing function (6) init_S
Optimizing function (7) matches_S
running necessary pass Function Pass Manager on function (7) matches_S
running necessary pass Module Verifier on function (7) matches_S
running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
running pass (1) Simplify the CFG on function (7) matches_S
running analysis pass Dominator Tree Construction on function (7) matches_S
running pass (2) SROA on function (7) matches_S
running pass (3) Early CSE on function (7) matches_S
Another case (1): EarlyCSE CSE value
Another case (2): EarlyCSE CSE value
Another case (3): EarlyCSE CSE value
Another case (4): EarlyCSE CSE load
Another case (5): EarlyCSE CSE value
Another case (6): EarlyCSE CSE load
Another case (7): EarlyCSE CSE value
Another case (8): EarlyCSE CSE load
NOT running pass (4) Lower 'expect' Intrinsics on function (7) matches_S
<additional passes ignored for brevity>
NOT Optimizing function (8) clean_up

Finally, the developer would use one last binary search to determine which
optimization case was causing the failure.

clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
<test fails>
clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
<test passes>
clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
<test fails>

Most of the above functionality can be implemented by inserting hooks into the
various pass managers.  Support for the '-num-fn-case' option would
require instrumentation of individual passes.  I propose implementing this on an
'opt in' basis so that it can be added to passes as needed and passes
which do not opt in will simply not report having selectable cases.

Note that I have introduced the concept of a "necessary" pass. 
Because the goal of this is to have the front end produce the same sort of
object or executable file that it normally would, some passes (such as the
register allocator and its various helper passes) cannot be skipped.  I am also,
for now, proposing that all analysis passes be run always to avoid the problem
of determining which analysis passes would be required for later
"necessary" passes.  We can revisit this as needed.

As I've said, I intend to handle loop, region and basic block passes as if
they were just another function pass.  While I would like to display an
associated number for the target loop, region or block, I don't see value in
being able to filter passes based on this value (because they are relatively
dynamic constructs), so consecutive runs of a single loop pass (for instance) on
different loops would be identified as distinct function passes.  For example:

running pass (25) Loop Invariant Code Motion on loop (1), function (7) matches_S
running pass (26) Loop Invariant Code Motion on loop (2), function (7) matches_S
running pass (27) Loop Invariant Code Motion on loop (3), function (7) matches_S

I would also like to have a feature that would allow the developer to request
that an LLVM IR/bitcode file be emitted just before the first disabled pass so
that the isolated IR could be used with opt, llc or bugpoint for further
debugging.  I haven't worked out exactly how this would work, so I'm
just mentioning it here as a fuzzy part of the proposal.

Circling back now to module and SCC passes, these would work in pretty much the
same way as the above method for function passes so I won't cover them in as
much detail.  I propose the following new command line options.

-num-mod
  sets an upper limit on the module on which to run module passes
  -1 enables all and displays the numbering
  defaults to 1 since I expect most front ends to usually operate on a single
module

-num-mod-pass
  sets an upper limit on the module pass to run on the limit module specified by
-num-mod
  -1 enables all and displays the numbering
  ignored if -num-mod is -1
  all passes are run on modules below the limit module
  only necessary and analysis passes are run on modules above the limit module

-num-mod-case
  sets an upper limit on the optimization case to apply within the limit module
pass specified by -num-mod-pass
  -1 enables all and displays the numbering
  ignored if -num-mod-pass is not used or is -1
  all cases are applied in passes below the limit module pass
  no cases are applied in optional passes below the limit module pass

-num-scc-pass
  sets an upper limit on the SCC pass to run
  -1 enables all and displays the numbering

-num-scc-case
  sets an upper limit on the optimization case to apply within the limit SCC
pass specified by -num-scc-pass
  -1 enables all and displays the numbering
  ignored if -num-scc-pass is not used or is -1
  all cases are applied in passes below the limit SCC pass
  no cases are applied in optional passes below the limit SCC pass

Similar to loops, regions and basic blocks, I would like to present numbering
information for each unique SCC, but I do not believe there is value in being
able to filter on a specific SCC because they are subject to reorganization. 
Rather I would treat each invocation of an SCC pass as a separate selectable
pass.  The output for an SCC problem search would look something like this:

Optimizing SCC (1): <null function>
running pass (1) Remove unused exception handling info on SCC (1)
running pass (2) Function Integration/Inlining on SCC (1)
running pass (3) Deduce function attributes on SCC (1)
Optimizing SCC (2): isupper
running pass (4) Remove unused exception handling info on SCC (2)
running pass (5) Function Integration/Inlining on SCC (2)
running pass (6) Deduce function attributes on SCC (2)
<snip>
Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
running pass (101) Remove unused exception handling info on SCC (34)
running pass (102) Function Integration/Inlining on SCC (34)
running pass (103) Deduce function attributes on SCC (34)
<snip>

Here I'm printing the functions (if any) associated with nodes in the SCC
just to give the developer some feedback on which part of the code is being
worked on.

I have a working prototype of all of the above functionality using both the new
pass manager and the legacy pass manager.

So what do you think?  Do you agree that support for a feature of this sort
would be useful in the LLVM core library?  What problems do you see in my
proposal?  What additional options would you like to see added?

Thanks in advance for any feedback.

You're right that we could benefit from support to help
"bisecting" where a miscompile happens in a more automated way.

Reading through you (long but nice) description, I thought about something
simpler: a single number incremented every time a pass in the optimizer is ran
(as the order shown by -debug-pass=Executions).
That way you run once to get the total number of passes that ran and then, a
simple bisect script can find quite quickly at which point the miscompile is
introduced (between 0 and max).
I haven't thought about it much but I'm interested in you view of the
pros/cons.

--
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/8e3b7eb6/attachment-0001.html>

Matthias Braun via llvm-dev

2016-Mar-25 23:30 UTC

head link

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Andy,

I'd love to see this feature. I wouldn't care too much wether the
interface is a single number or two numbers, but if possible print the name of
the function and the name of the pass that was just skipped so the developers
knows where to start looking.

- Matthias
> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Mehdi,
>  
> I started by trying to implement a single number approach, but it doesn’t
allow quite the approach I wanted in terms of isolating individual functions. 
Also, I think that there would be problems with that once parallel function
optimization is enabled.
>  
> I am open to revisiting that approach to see if the problems I had with it
can be resolved.
>   <>
> -Andy
>  
>  <>From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com] 
> Sent: Friday, March 25, 2016 11:57 AM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: New support for triaging optimization-related
failures in front ends
>  
> Hi Andy,
>  
> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> The Intel C++ compiler has long had an internal facility to help our
development teams track down the source of optimization related bugs by allowing
a sort of binary search in which optimization passes and even individual
optimizations are selectively disabled in response to front end command line
options.  As we've been doing development work on LLVM our developers have
found that we miss this capability and so we would like to introduce something
like this to the LLVM code base.
>  
> I am aware of the many existing facilities in LLVM for tracking down bugs
such as llc, opt and bugpoint.  Our development teams are, of course, making use
of these facilities, but we believe that the feature I am going to propose today
can powerfully complement these existing tools and provide a better way to
automate defect finding in products that provide a front end for LLVM.  This can
also be used by customers (who may not be able to share their source code) to
narrow down problems and provide a concise reproducer.
>  
> While the combination of llc, opt and bugpoint is very effective at
locating compilation failures that can be reproduced from an LLVM IR or bitcode
file, they can be cumbersome and difficult to automate, particularly when
debugging runtime failures in programs with non-trivial build systems.  The new
feature that I am proposing provides a way to selectively turn off ranges of
optimizations directly from a set of front end command line options.
>  
> The proposed feature works by assigning arbitrary but repeatable values to
modules, functions, passes, etc. as a compilation unit is being optimized.  The
developer can then use these numbers to selectively disable parts of the
optimization without otherwise altering the behavior of the compiler.
>  
> In keeping with the current LLVM pass manager structure, I am handling
module passes, SCC passes and function passes separately.  I propose handling
loop, region and basic block passes as if they were function passes.  (I think
what this means will become clear below.)  Because the handling of function
passes illustrates well all of the steps involved in using the feature I am
proposing, I'll start by describing that case in detail.  I'm describing
this as a series of manual steps, but I intend that these steps could be easily
automated.
>  
> I propose introducing three command line options to support locating
problems with function pass optimizations.  I envision these as options
available through the front end but implemented in the core LLVM library.
>  
> -num-fn
>   sets an upper limit on the function to optimize
>   -1 enables all and displays the numbering
>  
> -num-fn-pass
>   sets an upper limit on the function pass to run on the limit function
specified by -num-fn
>   -1 enables all and displays the numbering
>   ignored if -num-fn is not used or is -1
>   all passes are run on functions below the limit function
>   only necessary and analysis passes are run on functions above the limit
function
>  
> -num-fn-case
>   sets an upper limit on the optimization case to apply within the limit
function pass specified by -num-fn-pass
>   -1 enables all and displays the numbering
>   ignored if -num-fn-pass is not used or is -1
>   all cases are applied in passes below the limit function pass
>   no cases are applied in optional passes below the limit function pass
>  
> As an example, a developer searching for function pass related optimization
problems (assuming a case which fails when compiled with -O2 but passes when
compiled with -O0) would begin by using the '-num-fn=-1' option to see
the functioning numbering.
>  
> clang -c -O2 -num-fn=-1 test.c
>  
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> Optimizing function (8) clean_up
>  
> The developer would then use a binary search, recompiling with selective
optimization and re-running the test case to determine the function in which the
problem occurs.
>  
> clang -c -O2 -num-fn=4 test.c
> <test passes>
> clang -c -O2 -num-fn=6 test.c
> <test passes>
> clang -c -O2 -num-fn=7 test.c
> <test fails>
>  
> Having found the problem function, the developer would use the
'-num-fn-pass=-1' option to see the numbering of function passes
(including loop, region and basic block passes) that are run on that function.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>  
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>  
> The user would again use a binary search to determine the problem pass.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
> <test fails>
>  
> Having determined the problem pass, the developer would use the
'-num-fn-case=-1' option to see the numbering of individual
optimizations applied by the pass.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>  
> Optimizing function (1) prune_match
> Optimizing function (2) free_S
> Optimizing function (3) hash_S
> Optimizing function (4) insert_S
> Optimizing function (5) zero_S
> Optimizing function (6) init_S
> Optimizing function (7) matches_S
> running necessary pass Function Pass Manager on function (7) matches_S
> running necessary pass Module Verifier on function (7) matches_S
> running necessary pass Add DWARF path discriminators (3) on function (7)
matches_S
> running pass (1) Simplify the CFG on function (7) matches_S
> running analysis pass Dominator Tree Construction on function (7) matches_S
> running pass (2) SROA on function (7) matches_S
> running pass (3) Early CSE on function (7) matches_S
> Another case (1): EarlyCSE CSE value
> Another case (2): EarlyCSE CSE value
> Another case (3): EarlyCSE CSE value
> Another case (4): EarlyCSE CSE load
> Another case (5): EarlyCSE CSE value
> Another case (6): EarlyCSE CSE load
> Another case (7): EarlyCSE CSE value
> Another case (8): EarlyCSE CSE load
> NOT running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
> <additional passes ignored for brevity>
> NOT Optimizing function (8) clean_up
>  
> Finally, the developer would use one last binary search to determine which
optimization case was causing the failure.
>  
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
> <test fails>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
> <test passes>
> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
> <test fails>
>  
> Most of the above functionality can be implemented by inserting hooks into
the various pass managers.  Support for the '-num-fn-case' option would
require instrumentation of individual passes.  I propose implementing this on an
'opt in' basis so that it can be added to passes as needed and passes
which do not opt in will simply not report having selectable cases.
>  
> Note that I have introduced the concept of a "necessary" pass. 
Because the goal of this is to have the front end produce the same sort of
object or executable file that it normally would, some passes (such as the
register allocator and its various helper passes) cannot be skipped.  I am also,
for now, proposing that all analysis passes be run always to avoid the problem
of determining which analysis passes would be required for later
"necessary" passes.  We can revisit this as needed.
>  
> As I've said, I intend to handle loop, region and basic block passes as
if they were just another function pass.  While I would like to display an
associated number for the target loop, region or block, I don't see value in
being able to filter passes based on this value (because they are relatively
dynamic constructs), so consecutive runs of a single loop pass (for instance) on
different loops would be identified as distinct function passes.  For example:
>  
> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
matches_S
> running pass (26) Loop Invariant Code Motion on loop (2), function (7)
matches_S
> running pass (27) Loop Invariant Code Motion on loop (3), function (7)
matches_S
>  
> I would also like to have a feature that would allow the developer to
request that an LLVM IR/bitcode file be emitted just before the first disabled
pass so that the isolated IR could be used with opt, llc or bugpoint for further
debugging.  I haven't worked out exactly how this would work, so I'm
just mentioning it here as a fuzzy part of the proposal.
>  
> Circling back now to module and SCC passes, these would work in pretty much
the same way as the above method for function passes so I won't cover them
in as much detail.  I propose the following new command line options.
>  
> -num-mod
>   sets an upper limit on the module on which to run module passes
>   -1 enables all and displays the numbering
>   defaults to 1 since I expect most front ends to usually operate on a
single module
>  
> -num-mod-pass
>   sets an upper limit on the module pass to run on the limit module
specified by -num-mod
>   -1 enables all and displays the numbering
>   ignored if -num-mod is -1
>   all passes are run on modules below the limit module
>   only necessary and analysis passes are run on modules above the limit
module
>  
> -num-mod-case
>   sets an upper limit on the optimization case to apply within the limit
module pass specified by -num-mod-pass
>   -1 enables all and displays the numbering
>   ignored if -num-mod-pass is not used or is -1
>   all cases are applied in passes below the limit module pass
>   no cases are applied in optional passes below the limit module pass
>  
> -num-scc-pass
>   sets an upper limit on the SCC pass to run
>   -1 enables all and displays the numbering
>  
> -num-scc-case
>   sets an upper limit on the optimization case to apply within the limit
SCC pass specified by -num-scc-pass
>   -1 enables all and displays the numbering
>   ignored if -num-scc-pass is not used or is -1
>   all cases are applied in passes below the limit SCC pass
>   no cases are applied in optional passes below the limit SCC pass
>  
> Similar to loops, regions and basic blocks, I would like to present
numbering information for each unique SCC, but I do not believe there is value
in being able to filter on a specific SCC because they are subject to
reorganization.  Rather I would treat each invocation of an SCC pass as a
separate selectable pass.  The output for an SCC problem search would look
something like this:
>  
> Optimizing SCC (1): <null function>
> running pass (1) Remove unused exception handling info on SCC (1)
> running pass (2) Function Integration/Inlining on SCC (1)
> running pass (3) Deduce function attributes on SCC (1)
> Optimizing SCC (2): isupper
> running pass (4) Remove unused exception handling info on SCC (2)
> running pass (5) Function Integration/Inlining on SCC (2)
> running pass (6) Deduce function attributes on SCC (2)
> <snip>
> Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
> running pass (101) Remove unused exception handling info on SCC (34)
> running pass (102) Function Integration/Inlining on SCC (34)
> running pass (103) Deduce function attributes on SCC (34)
> <snip>
>  
> Here I'm printing the functions (if any) associated with nodes in the
SCC just to give the developer some feedback on which part of the code is being
worked on.
>  
> I have a working prototype of all of the above functionality using both the
new pass manager and the legacy pass manager.
>  
> So what do you think?  Do you agree that support for a feature of this sort
would be useful in the LLVM core library?  What problems do you see in my
proposal?  What additional options would you like to see added?
>  
> Thanks in advance for any feedback.
>  
>  
> You're right that we could benefit from support to help
"bisecting" where a miscompile happens in a more automated way.
>  
> Reading through you (long but nice) description, I thought about something
simpler: a single number incremented every time a pass in the optimizer is ran
(as the order shown by -debug-pass=Executions).
> That way you run once to get the total number of passes that ran and then,
a simple bisect script can find quite quickly at which point the miscompile is
introduced (between 0 and max).
> I haven't thought about it much but I'm interested in you view of
the pros/cons.
>  
> -- 
> Mehdi
>  
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/bcf0e599/attachment.html>

Michael Gottesman via llvm-dev

2016-Mar-25 23:34 UTC

head link

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

> On Mar 25, 2016, at 4:30 PM, Matthias Braun via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Andy,
> 
> I'd love to see this feature. I wouldn't care too much wether the
interface is a single number or two numbers, but if possible print the name of
the function and the name of the pass that was just skipped so the developers
knows where to start looking.
We do this in swift actually as well. We have a feature called
-sil-print-pass-name. It causes the pass manager to print out the action the
pass manager is performing and an identifier of the object it is performing the
action to. When you get the final number, you can rerun with that option set and
it will tell you the last pass invocation. It makes it /really/ easy.

Michael
> 
> - Matthias
> 
>> On Mar 25, 2016, at 12:13 PM, Kaylor, Andrew via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hi Mehdi,
>>  
>> I started by trying to implement a single number approach, but it
doesn’t allow quite the approach I wanted in terms of isolating individual
functions.  Also, I think that there would be problems with that once parallel
function optimization is enabled.
>>  
>> I am open to revisiting that approach to see if the problems I had with
it can be resolved.
>>   <>
>> -Andy
>>  
>>  <>From: mehdi.amini at apple.com <mailto:mehdi.amini at
apple.com> [mailto:mehdi.amini at apple.com <mailto:mehdi.amini at
apple.com>]
>> Sent: Friday, March 25, 2016 11:57 AM
>> To: Kaylor, Andrew <andrew.kaylor at intel.com
<mailto:andrew.kaylor at intel.com>>
>> Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>> Subject: Re: [llvm-dev] RFC: New support for triaging
optimization-related failures in front ends
>>  
>> Hi Andy,
>>  
>> On Mar 25, 2016, at 11:41 AM, Kaylor, Andrew via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>  
>> The Intel C++ compiler has long had an internal facility to help our
development teams track down the source of optimization related bugs by allowing
a sort of binary search in which optimization passes and even individual
optimizations are selectively disabled in response to front end command line
options.  As we've been doing development work on LLVM our developers have
found that we miss this capability and so we would like to introduce something
like this to the LLVM code base.
>>  
>> I am aware of the many existing facilities in LLVM for tracking down
bugs such as llc, opt and bugpoint.  Our development teams are, of course,
making use of these facilities, but we believe that the feature I am going to
propose today can powerfully complement these existing tools and provide a
better way to automate defect finding in products that provide a front end for
LLVM.  This can also be used by customers (who may not be able to share their
source code) to narrow down problems and provide a concise reproducer.
>>  
>> While the combination of llc, opt and bugpoint is very effective at
locating compilation failures that can be reproduced from an LLVM IR or bitcode
file, they can be cumbersome and difficult to automate, particularly when
debugging runtime failures in programs with non-trivial build systems.  The new
feature that I am proposing provides a way to selectively turn off ranges of
optimizations directly from a set of front end command line options.
>>  
>> The proposed feature works by assigning arbitrary but repeatable values
to modules, functions, passes, etc. as a compilation unit is being optimized. 
The developer can then use these numbers to selectively disable parts of the
optimization without otherwise altering the behavior of the compiler.
>>  
>> In keeping with the current LLVM pass manager structure, I am handling
module passes, SCC passes and function passes separately.  I propose handling
loop, region and basic block passes as if they were function passes.  (I think
what this means will become clear below.)  Because the handling of function
passes illustrates well all of the steps involved in using the feature I am
proposing, I'll start by describing that case in detail.  I'm describing
this as a series of manual steps, but I intend that these steps could be easily
automated.
>>  
>> I propose introducing three command line options to support locating
problems with function pass optimizations.  I envision these as options
available through the front end but implemented in the core LLVM library.
>>  
>> -num-fn
>>   sets an upper limit on the function to optimize
>>   -1 enables all and displays the numbering
>>  
>> -num-fn-pass
>>   sets an upper limit on the function pass to run on the limit function
specified by -num-fn
>>   -1 enables all and displays the numbering
>>   ignored if -num-fn is not used or is -1
>>   all passes are run on functions below the limit function
>>   only necessary and analysis passes are run on functions above the
limit function
>>  
>> -num-fn-case
>>   sets an upper limit on the optimization case to apply within the
limit function pass specified by -num-fn-pass
>>   -1 enables all and displays the numbering
>>   ignored if -num-fn-pass is not used or is -1
>>   all cases are applied in passes below the limit function pass
>>   no cases are applied in optional passes below the limit function pass
>>  
>> As an example, a developer searching for function pass related
optimization problems (assuming a case which fails when compiled with -O2 but
passes when compiled with -O0) would begin by using the '-num-fn=-1'
option to see the functioning numbering.
>>  
>> clang -c -O2 -num-fn=-1 test.c
>>  
>> Optimizing function (1) prune_match
>> Optimizing function (2) free_S
>> Optimizing function (3) hash_S
>> Optimizing function (4) insert_S
>> Optimizing function (5) zero_S
>> Optimizing function (6) init_S
>> Optimizing function (7) matches_S
>> Optimizing function (8) clean_up
>>  
>> The developer would then use a binary search, recompiling with
selective optimization and re-running the test case to determine the function in
which the problem occurs.
>>  
>> clang -c -O2 -num-fn=4 test.c
>> <test passes>
>> clang -c -O2 -num-fn=6 test.c
>> <test passes>
>> clang -c -O2 -num-fn=7 test.c
>> <test fails>
>>  
>> Having found the problem function, the developer would use the
'-num-fn-pass=-1' option to see the numbering of function passes
(including loop, region and basic block passes) that are run on that function.
>>  
>> clang -c -O2 -num-fn=7 -num-fn-pass=-1 test.c
>>  
>> Optimizing function (1) prune_match
>> Optimizing function (2) free_S
>> Optimizing function (3) hash_S
>> Optimizing function (4) insert_S
>> Optimizing function (5) zero_S
>> Optimizing function (6) init_S
>> Optimizing function (7) matches_S
>> running necessary pass Function Pass Manager on function (7) matches_S
>> running necessary pass Module Verifier on function (7) matches_S
>> running necessary pass Add DWARF path discriminators (3) on function
(7) matches_S
>> running pass (1) Simplify the CFG on function (7) matches_S
>> running analysis pass Dominator Tree Construction on function (7)
matches_S
>> running pass (2) SROA on function (7) matches_S
>> running pass (3) Early CSE on function (7) matches_S
>> running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
>> <additional passes ignored for brevity>
>> NOT Optimizing function (8) clean_up
>>  
>> The user would again use a binary search to determine the problem pass.
>>  
>> clang -c -O2 -num-fn=7 -num-fn-pass=2 test.c
>> <test passes>
>> clang -c -O2 -num-fn=7 -num-fn-pass=3 test.c
>> <test fails>
>>  
>> Having determined the problem pass, the developer would use the
'-num-fn-case=-1' option to see the numbering of individual
optimizations applied by the pass.
>>  
>> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=-1 test.c
>>  
>> Optimizing function (1) prune_match
>> Optimizing function (2) free_S
>> Optimizing function (3) hash_S
>> Optimizing function (4) insert_S
>> Optimizing function (5) zero_S
>> Optimizing function (6) init_S
>> Optimizing function (7) matches_S
>> running necessary pass Function Pass Manager on function (7) matches_S
>> running necessary pass Module Verifier on function (7) matches_S
>> running necessary pass Add DWARF path discriminators (3) on function
(7) matches_S
>> running pass (1) Simplify the CFG on function (7) matches_S
>> running analysis pass Dominator Tree Construction on function (7)
matches_S
>> running pass (2) SROA on function (7) matches_S
>> running pass (3) Early CSE on function (7) matches_S
>> Another case (1): EarlyCSE CSE value
>> Another case (2): EarlyCSE CSE value
>> Another case (3): EarlyCSE CSE value
>> Another case (4): EarlyCSE CSE load
>> Another case (5): EarlyCSE CSE value
>> Another case (6): EarlyCSE CSE load
>> Another case (7): EarlyCSE CSE value
>> Another case (8): EarlyCSE CSE load
>> NOT running pass (4) Lower 'expect' Intrinsics on function (7)
matches_S
>> <additional passes ignored for brevity>
>> NOT Optimizing function (8) clean_up
>>  
>> Finally, the developer would use one last binary search to determine
which optimization case was causing the failure.
>>  
>> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=4 test.c
>> <test fails>
>> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=2 test.c
>> <test passes>
>> clang -c -O2 -num-fn=7 -num-fn-pass=3 -num-fn-case=3 test.c
>> <test fails>
>>  
>> Most of the above functionality can be implemented by inserting hooks
into the various pass managers.  Support for the '-num-fn-case' option
would require instrumentation of individual passes.  I propose implementing this
on an 'opt in' basis so that it can be added to passes as needed and
passes which do not opt in will simply not report having selectable cases.
>>  
>> Note that I have introduced the concept of a "necessary"
pass.  Because the goal of this is to have the front end produce the same sort
of object or executable file that it normally would, some passes (such as the
register allocator and its various helper passes) cannot be skipped.  I am also,
for now, proposing that all analysis passes be run always to avoid the problem
of determining which analysis passes would be required for later
"necessary" passes.  We can revisit this as needed.
>>  
>> As I've said, I intend to handle loop, region and basic block
passes as if they were just another function pass.  While I would like to
display an associated number for the target loop, region or block, I don't
see value in being able to filter passes based on this value (because they are
relatively dynamic constructs), so consecutive runs of a single loop pass (for
instance) on different loops would be identified as distinct function passes. 
For example:
>>  
>> running pass (25) Loop Invariant Code Motion on loop (1), function (7)
matches_S
>> running pass (26) Loop Invariant Code Motion on loop (2), function (7)
matches_S
>> running pass (27) Loop Invariant Code Motion on loop (3), function (7)
matches_S
>>  
>> I would also like to have a feature that would allow the developer to
request that an LLVM IR/bitcode file be emitted just before the first disabled
pass so that the isolated IR could be used with opt, llc or bugpoint for further
debugging.  I haven't worked out exactly how this would work, so I'm
just mentioning it here as a fuzzy part of the proposal.
>>  
>> Circling back now to module and SCC passes, these would work in pretty
much the same way as the above method for function passes so I won't cover
them in as much detail.  I propose the following new command line options.
>>  
>> -num-mod
>>   sets an upper limit on the module on which to run module passes
>>   -1 enables all and displays the numbering
>>   defaults to 1 since I expect most front ends to usually operate on a
single module
>>  
>> -num-mod-pass
>>   sets an upper limit on the module pass to run on the limit module
specified by -num-mod
>>   -1 enables all and displays the numbering
>>   ignored if -num-mod is -1
>>   all passes are run on modules below the limit module
>>   only necessary and analysis passes are run on modules above the limit
module
>>  
>> -num-mod-case
>>   sets an upper limit on the optimization case to apply within the
limit module pass specified by -num-mod-pass
>>   -1 enables all and displays the numbering
>>   ignored if -num-mod-pass is not used or is -1
>>   all cases are applied in passes below the limit module pass
>>   no cases are applied in optional passes below the limit module pass
>>  
>> -num-scc-pass
>>   sets an upper limit on the SCC pass to run
>>   -1 enables all and displays the numbering
>>  
>> -num-scc-case
>>   sets an upper limit on the optimization case to apply within the
limit SCC pass specified by -num-scc-pass
>>   -1 enables all and displays the numbering
>>   ignored if -num-scc-pass is not used or is -1
>>   all cases are applied in passes below the limit SCC pass
>>   no cases are applied in optional passes below the limit SCC pass
>>  
>> Similar to loops, regions and basic blocks, I would like to present
numbering information for each unique SCC, but I do not believe there is value
in being able to filter on a specific SCC because they are subject to
reorganization.  Rather I would treat each invocation of an SCC pass as a
separate selectable pass.  The output for an SCC problem search would look
something like this:
>>  
>> Optimizing SCC (1): <null function>
>> running pass (1) Remove unused exception handling info on SCC (1)
>> running pass (2) Function Integration/Inlining on SCC (1)
>> running pass (3) Deduce function attributes on SCC (1)
>> Optimizing SCC (2): isupper
>> running pass (4) Remove unused exception handling info on SCC (2)
>> running pass (5) Function Integration/Inlining on SCC (2)
>> running pass (6) Deduce function attributes on SCC (2)
>> <snip>
>> Optimizing SCC (34): or_purge_E_list, and_purge_E_list, purge_Exp
>> running pass (101) Remove unused exception handling info on SCC (34)
>> running pass (102) Function Integration/Inlining on SCC (34)
>> running pass (103) Deduce function attributes on SCC (34)
>> <snip>
>>  
>> Here I'm printing the functions (if any) associated with nodes in
the SCC just to give the developer some feedback on which part of the code is
being worked on.
>>  
>> I have a working prototype of all of the above functionality using both
the new pass manager and the legacy pass manager.
>>  
>> So what do you think?  Do you agree that support for a feature of this
sort would be useful in the LLVM core library?  What problems do you see in my
proposal?  What additional options would you like to see added?
>>  
>> Thanks in advance for any feedback.
>>  
>>  
>> You're right that we could benefit from support to help
"bisecting" where a miscompile happens in a more automated way.
>>  
>> Reading through you (long but nice) description, I thought about
something simpler: a single number incremented every time a pass in the
optimizer is ran (as the order shown by -debug-pass=Executions).
>> That way you run once to get the total number of passes that ran and
then, a simple bisect script can find quite quickly at which point the
miscompile is introduced (between 0 and max).
>> I haven't thought about it much but I'm interested in you view
of the pros/cons.
>>  
>> -- 
>> Mehdi
>>  
>>  
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160325/b45ce878/attachment-0001.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Mar 2016 - RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

[llvm-dev] RFC: New support for triaging optimization-related failures in front ends

Seemingly Similar Threads