thr3ads.net - llvm dev - [LLVMdev] GSoC 2011: Superoptimization for LLVM IR [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Rafael Auler

2011-Apr-06 21:09 UTC

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

Hello,

I want to present my project for GSoC 2011 for LLVM below. It would be very
nice to hear suggestions and your opinion, thanks!

Superoptimization for LLVM IR

Objective

This project focuses on implementing superoptimization algorithms targeted
at
the LLVM IR. The project uses arbitrary LLVM bitcode as a training set to
discover new peephole optimizations that can be later integrated into LLVM
instruction simplify phase or as a separate peephole optimization pass.

Motivation

Code optimization is the task of using a faster sequence of instructions
which
compute the same information, compared to a previous code sequence. The old
sequence may then be substituted by the new one, creating a faster program
with the same behavior. The problem of ﬁnding a code sequence that does some
computation in optimal time is undecidable and, in practice, optimizations
are
discovered by humans using analytical reasoning. Finding new optimizations
in
an automated fashion, using a search algorithm, is a diﬀerent approach to
the
problem and became known in literature as superoptimization, term coined by
Massalin [2].
Using computer-aided optimization discovery, diﬀerent kind of optimizations
unnoticed by humans may be detected, specially those speciﬁc to the target
machine code [1]. This project will focus on implementing a superoptimizer
which will exhaustively investigate new optimizations for the LLVM IR. Using
this approach, I hope to greatly contribute to the current LLVM instruction
simplify phase, introducing new substitutions identiﬁed in this project.


Methodology

There is already a superoptimizer for LLVM in initial stages of development.
The existing superoptimizer currently limits its search space to a few
transfor-
mations. This project will focus on improving this current superoptimizer by
extending the number of transformations applied and thus greatly increasing
the search space.

In order to test the superoptimizer eﬃciency, it will use LLVM bitcodes
compiled from the entire LLVM test suite and SPEC to harvest possible code
sequence optimizations. When successfully identiﬁed, these optimizations
could
be implemented directly as peephole optimizations for LLVM IR.

Timeline

This project is expected to last three months, as follows.

• 1st month, 1st week - Study the LLVM IR in order to establish a
list of axioms, that is, valid transformations that determine the search
algorithm capability. There is an important trade-oﬀ to investigate here,
since using fewer transformations will lead to faster searches, but several
good optimizations may be missed.
• 1st month, 2nd week - Thorough study of the current implemented
base and new algorithm design to support arbitrary transformations.
• 1st month, 3rd week until end of project - Implementation of the
search algorithm.
• 2nd month - Test phase begins. Test algorithm with simple transforma-
tions searching for optimized code sequences.
• 3rd month - Harvesting new optimizations and implementing ﬁxes to
the search algorithm as more performance feedback from training data is
received.

Biography

I am a Master student at the University of Campinas (UNICAMP), Brazil (time-
zone GMT -3). In my master thesis, I have already worked with superoptimiza-
tion theory and automatic code sequence searching. But instead of using it
for
optimization purposes, I use this to automatically generate a LLVM backend
based on a machine description using the ArchC architecture description lan-
guage. The backend patterns are ﬁxed and the implementation of the pattern
is inferred using a search algorithm similar to the one used on
superoptimizers.
Thus, I have experience both on superoptimizer theory and LLVM code.


References

[1] Sorav Bansal and Alex Aiken. Automatic generation of peephole superopti-
mizers. SIGPLAN Not., 41:394–403, October 2006.

[2] Henry Massalin. Superoptimizer: a look at the smallest program. In
ASPLOS-II: Proceedings of the second international conference on Architec-
tual support for programming languages and operating systems, pages 122–
126, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110406/2161e6bb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Auler,GSoC2011.pdf
Type: application/pdf
Size: 102451 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110406/2161e6bb/attachment.pdf>

Hendrik vP

2011-Apr-07 12:47 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

I am no llvm/compiler/optimization-expert and not able to mentor this
project -
I just want to say that this sounds very interesting and I'm hoping that
maybe someone
else could mentor it, or, at least, leave some useful comments, as it sounds
like
something that could improve llvm a lot - doesn't it?

2011/4/6 Rafael Auler <rafaelauler at gmail.com>
> Hello,
>
> I want to present my project for GSoC 2011 for LLVM below. It would be very
> nice to hear suggestions and your opinion, thanks!
>
> Superoptimization for LLVM IR
>
> Objective
>
> This project focuses on implementing superoptimization algorithms targeted
> at
> the LLVM IR. The project uses arbitrary LLVM bitcode as a training set to
> discover new peephole optimizations that can be later integrated into LLVM
> instruction simplify phase or as a separate peephole optimization pass.
>
> Motivation
>
> Code optimization is the task of using a faster sequence of instructions
> which
> compute the same information, compared to a previous code sequence. The old
> sequence may then be substituted by the new one, creating a faster program
> with the same behavior. The problem of ﬁnding a code sequence that does
> some
> computation in optimal time is undecidable and, in practice, optimizations
> are
> discovered by humans using analytical reasoning. Finding new optimizations
> in
> an automated fashion, using a search algorithm, is a diﬀerent approach to
> the
> problem and became known in literature as superoptimization, term coined by
> Massalin [2].
> Using computer-aided optimization discovery, diﬀerent kind of optimizations
> unnoticed by humans may be detected, specially those speciﬁc to the target
> machine code [1]. This project will focus on implementing a superoptimizer
> which will exhaustively investigate new optimizations for the LLVM IR.
> Using
> this approach, I hope to greatly contribute to the current LLVM instruction
> simplify phase, introducing new substitutions identiﬁed in this project.
>
>
> Methodology
>
> There is already a superoptimizer for LLVM in initial stages of
> development.
> The existing superoptimizer currently limits its search space to a few
> transfor-
> mations. This project will focus on improving this current superoptimizer
> by
> extending the number of transformations applied and thus greatly increasing
> the search space.
>
> In order to test the superoptimizer eﬃciency, it will use LLVM bitcodes
> compiled from the entire LLVM test suite and SPEC to harvest possible code
> sequence optimizations. When successfully identiﬁed, these optimizations
> could
> be implemented directly as peephole optimizations for LLVM IR.
>
> Timeline
>
> This project is expected to last three months, as follows.
>
> • 1st month, 1st week - Study the LLVM IR in order to establish a
> list of axioms, that is, valid transformations that determine the search
> algorithm capability. There is an important trade-oﬀ to investigate here,
> since using fewer transformations will lead to faster searches, but several
> good optimizations may be missed.
> • 1st month, 2nd week - Thorough study of the current implemented
> base and new algorithm design to support arbitrary transformations.
> • 1st month, 3rd week until end of project - Implementation of the
> search algorithm.
> • 2nd month - Test phase begins. Test algorithm with simple transforma-
> tions searching for optimized code sequences.
> • 3rd month - Harvesting new optimizations and implementing ﬁxes to
> the search algorithm as more performance feedback from training data is
> received.
>
> Biography
>
> I am a Master student at the University of Campinas (UNICAMP), Brazil
> (time-
> zone GMT -3). In my master thesis, I have already worked with
> superoptimiza-
> tion theory and automatic code sequence searching. But instead of using it
> for
> optimization purposes, I use this to automatically generate a LLVM backend
> based on a machine description using the ArchC architecture description
> lan-
> guage. The backend patterns are ﬁxed and the implementation of the pattern
> is inferred using a search algorithm similar to the one used on
> superoptimizers.
> Thus, I have experience both on superoptimizer theory and LLVM code.
>
>
> References
>
> [1] Sorav Bansal and Alex Aiken. Automatic generation of peephole
> superopti-
> mizers. SIGPLAN Not., 41:394–403, October 2006.
>
> [2] Henry Massalin. Superoptimizer: a look at the smallest program. In
> ASPLOS-II: Proceedings of the second international conference on Architec-
> tual support for programming languages and operating systems, pages 122–
> 126, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110407/93916f93/attachment.html>

John Regehr

2011-Apr-07 15:09 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

A bit of feedback on this proposal:

It worries me that you talk about "extending the number of 
transformations."  The point of a superoptimizer is that it has no 
specific model of transformations.  Rather, it is simply looking for cheap 
code fragments that are equivalent to expensive ones.

You have not specified how you will perform equivalence checking between 
code sequences.  The best answer that I know of is to use the technique 
from the Bansal and Aiken paper, which is to translate bitcode sequences 
into a SAT or SMT instance and just ask if they are equivalent functions.

You have not specified the cost model that you will use to assess the 
desirability of each potential transformation.  Simply replacing larger 
sequences with smaller ones is not necessarily a good idea since you may 
end up replacing a series of shift operations with a division. or 
something like that.

A linear sequence of LLVM instructions is perhaps not the best starting 
point.  A better primitive unit of code would be a subgraph of the PDG. 
This will ensure that the superoptimizer is looking at instructions that 
are semantically related.  Also, you will need to consider dependencies in 
any case to make this work, so you might as well do it from the outset.

John Regehr

John Regehr

2011-Apr-07 15:26 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

> It worries me that you talk about "extending the number of
> transformations."  The point of a superoptimizer is that it has no
> specific model of transformations.  Rather, it is simply looking for cheap
> code fragments that are equivalent to expensive ones.
I could stand to be more clear...

Could you describe explictly the role of transformations in your 
superoptimizer framework?  And also give examples of what kinds of 
transformations you are talking about?

Could you describe how it is established that two code fragments are 
equivalent?

John

Duncan Sands

2011-Apr-07 16:46 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

Hi Rafael,
> Methodology
>
> There is already a superoptimizer for LLVM in initial stages of
development.
> The existing superoptimizer currently limits its search space to a few
transfor-
> mations. This project will focus on improving this current superoptimizer
by
> extending the number of transformations applied and thus greatly increasing
> the search space.
it would be great if you could extend my superoptimizer to do fully general
superoptimization.
> In order to test the superoptimizer eﬃciency, it will use LLVM bitcodes
> compiled from the entire LLVM test suite and SPEC to harvest possible code
> sequence optimizations. When successfully identiﬁed, these optimizations
could
> be implemented directly as peephole optimizations for LLVM IR.
Right, this is exactly what has been done with the existing superoptimizer,
which, in spite of its simplicity, discovered a lot of commonly occurring
simplifications in SPEC, the LLVM testsuite and the Ada ACATS testsuite.  A
bunch of these were implemented in LLVM 2.9, mostly in InstructionSimplify.
> • 1st month, 1st week - Study the LLVM IR in order to establish a
> list of axioms, that is, valid transformations that determine the search
> algorithm capability. There is an important trade-oﬀ to investigate here,
> since using fewer transformations will lead to faster searches, but several
> good optimizations may be missed.
It sounds like you are planning to follow the approach of Joshi, Nelson and
Randall ("Denali: A Goal-directed Superoptimizer") in that you
don't intend
to exhaustively enumerate all possible code sequences, and see if they are
the same as the original only better; but instead start from the original code
sequence and repeatedly apply transformations that change code to equivalent
code, looking to see if you can eventually get something better than the
original.  Is that right?

Ciao, Duncan.

Rafael Auler

2011-Apr-07 22:03 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

Hello all, thanks for the feedback!

It sounds like you are planning to follow the approach of Joshi, Nelson
and> Randall ("Denali: A Goal-directed Superoptimizer") in that you
don't intend
> to exhaustively enumerate all possible code sequences, and see if they are
> the same as the original only better; but instead start from the original
> code
> sequence and repeatedly apply transformations that change code to
> equivalent
> code, looking to see if you can eventually get something better than the
> original.  Is that right?
>
>Exactly. If we can use axioms/transformations/identities which don't modify
the code behavior, but just tries to interpret instruction semantics in a
different way, so as to find another (and hopefully simpler) instructions to
do the same job, the role of the SAT prover is no longer necessary, since
the search space is constrained to expressions which are semantically
equivalent.

Example:
Transformation:  (ADD  Operand_A, Contant 0)  <- equivalent to-> Operand_A

Using this transformation, the search algorithm now can simplify "add A
with
0" to "A" or, what may look initially not so useful, transform
any operand A
into an add with A plus 0. Later, this is important to prove we can use an
add instruction to load any operand. Well, I think you got the idea of using
a set of axioms... this, in practice, is important to prove an arbitrary
sequence A is equivalent to SequenceB, and, if B is shorter than A, use B
instead.  Speaking of implementation, there will be a big list of
transformations and a "for" loop trying each transformation and
recursively
applying more transformations, until we can show a code sequence is
equivalent to another. This is a short overview of the algorithm and how I
worked in this area. If you prefer another approach described formally in
another paper, I can study it and use another algorithm.

Note that this system is a simple theorem prover, since it applies several
equivalence rules to an expression to try to transform it to the answer. The
difference is that we don't have a "answer" to prove. Any faster
sequence
will suffice as "answer". Regarding the performance "shift vs.
multiply",
the instructions will have weights. Since we are working at the IR level, we
can't assume neither is good. That's why superoptimization is best
applied
as peephole optimization of a backend prior to code emission - when we know
more about the architecture, we can take more appropriate decisions and,
with superoptimization, use obscure target machine instructions to compute
apparently unrelated expressions. If we use this at the IR level, some
suggested optimizations may not necessarily lead to better code after the
backend pass (lowering/transformation to selectionDAG). That's the downside
of using it at the IR level. On the other side, many other simple
substitutions may be learned by the algorithm and improve the
instructionsimplify.

Thanks,

-Rafael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110407/ce89f157/attachment.html>

Mai, Haohui

2011-Apr-08 00:27 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

IMO super optimizer would yield less benefits on LLVM compared to
other compilers.

If you check the patch of the instcombine pass, you'll find out people
keep dragging "correct" optimization out, not because the optimization
violates the semantic of LLVM IR, but it will generate wrong code
sequences when lowering to machine code.

An example:

%3 = fcmp %1, %2
%6 = fcmp %4, %5
%7 = and %3, %6
%8 = and %7, %foo

Sometimes you'll be screw if you want to play reassociate %7 and %8. I
don't see a easy way of catching them in theorem provers.

Haohui

On Thu, Apr 7, 2011 at 5:03 PM, Rafael Auler <rafaelauler at gmail.com>
wrote:> Hello all, thanks for the feedback!
>>
>> It sounds like you are planning to follow the approach of Joshi, Nelson
>> and
>> Randall ("Denali: A Goal-directed Superoptimizer") in that
you don't
>> intend
>> to exhaustively enumerate all possible code sequences, and see if they
are
>> the same as the original only better; but instead start from the
original
>> code
>> sequence and repeatedly apply transformations that change code to
>> equivalent
>> code, looking to see if you can eventually get something better than
the
>> original.  Is that right?
>>
>
> Exactly. If we can use axioms/transformations/identities which don't
modify
> the code behavior, but just tries to interpret instruction semantics in a
> different way, so as to find another (and hopefully simpler) instructions
to
> do the same job, the role of the SAT prover is no longer necessary, since
> the search space is constrained to expressions which are semantically
> equivalent.
> Example:
> Transformation:  (ADD  Operand_A, Contant 0)  <- equivalent to->
Operand_A
> Using this transformation, the search algorithm now can simplify "add
A with
> 0" to "A" or, what may look initially not so useful,
transform any operand A
> into an add with A plus 0. Later, this is important to prove we can use an
> add instruction to load any operand. Well, I think you got the idea of
using
> a set of axioms... this, in practice, is important to prove an arbitrary
> sequence A is equivalent to SequenceB, and, if B is shorter than A, use B
> instead.  Speaking of implementation, there will be a big list of
> transformations and a "for" loop trying each transformation and
recursively
> applying more transformations, until we can show a code sequence is
> equivalent to another. This is a short overview of the algorithm and how I
> worked in this area. If you prefer another approach described formally in
> another paper, I can study it and use another algorithm.
> Note that this system is a simple theorem prover, since it applies several
> equivalence rules to an expression to try to transform it to the answer.
The
> difference is that we don't have a "answer" to prove. Any
faster sequence
> will suffice as "answer". Regarding the performance "shift
vs. multiply",
> the instructions will have weights. Since we are working at the IR level,
we
> can't assume neither is good. That's why superoptimization is best
applied
> as peephole optimization of a backend prior to code emission - when we know
> more about the architecture, we can take more appropriate decisions and,
> with superoptimization, use obscure target machine instructions to compute
> apparently unrelated expressions. If we use this at the IR level, some
> suggested optimizations may not necessarily lead to better code after the
> backend pass (lowering/transformation to selectionDAG). That's the
downside
> of using it at the IR level. On the other side, many other simple
> substitutions may be learned by the algorithm and improve the
> instructionsimplify.
> Thanks,
> -Rafael
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Duncan Sands

2011-Apr-08 09:04 UTC

head link

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

Hi Rafael, don't forget to submit your proposal to GSOC (via the GSOC
web-page)
- the deadline is today!
>
>     It sounds like you are planning to follow the approach of Joshi, Nelson
and
>     Randall ("Denali: A Goal-directed Superoptimizer") in that
you don't intend
>     to exhaustively enumerate all possible code sequences, and see if they
are
>     the same as the original only better; but instead start from the
original code
>     sequence and repeatedly apply transformations that change code to
equivalent
>     code, looking to see if you can eventually get something better than
the
>     original.  Is that right?
>
>
> Exactly. If we can use axioms/transformations/identities which don't
modify the
> code behavior, but just tries to interpret instruction semantics in a
different
> way, so as to find another (and hopefully simpler) instructions to do the
same
> job, the role of the SAT prover is no longer necessary, since the search
space
> is constrained to expressions which are semantically equivalent.
>
> Example:
> Transformation:  (ADD  Operand_A, Contant 0) <- equivalent to->
Operand_A
>
> Using this transformation, the search algorithm now can simplify "add
A with 0"
> to "A" or, what may look initially not so useful, transform any
operand A into
> an add with A plus 0. Later, this is important to prove we can use an add
> instruction to load any operand. Well, I think you got the idea of using a
set
> of axioms... this, in practice, is important to prove an arbitrary sequence
A is
> equivalent to SequenceB, and, if B is shorter than A, use B instead. 
Speaking
> of implementation, there will be a big list of transformations and a
"for" loop
> trying each transformation and recursively applying more transformations,
until
> we can show a code sequence is equivalent to another. This is a short
overview
> of the algorithm and how I worked in this area. If you prefer another
approach
> described formally in another paper, I can study it and use another
algorithm.
I am fine with this approach, even though it can miss interesting reductions,
since it is way more efficient (from what I hear) than the fully general
approach, so should be able to handle code sequences longer than just a couple
of instructions.

Ciao, Duncan.

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Apr 2011 - [LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

[LLVMdev] GSoC 2011: Superoptimization for LLVM IR

Maybe Matching Threads