thr3ads.net - llvm dev - [llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing. [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Puyan Lotfi via llvm-dev

2017-Aug-15 19:06 UTC

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Hi,


My name is Puyan and I've been exploring ways to improve the state of
instruction level diffing using llvm and MIR. Below is a proposal for a new
llvm tool meant to address issues encountered when diffing at the machine
level. I'm eager to hear the community's feedback.


Thanks


PL


mir-canon: A new tool for canonicalizing MIR for cleaner diffing.


Problem Statement and Context:

Diff-tools are regularly used for comparing IR and assembly files. This
often involves reasoning through differences that are semantically
equivalent and it can be very time consuming for a person to do said
reasoning.

Specifically in the context of GlobalISel development there are problems of
correctness verification. There is a need to compare two programs, compiled
from identical IR by two different instruction selectors in a way where the
true differences stand out.


Proposal:

We propose a new tool that we have tentatively named 'mir-canon' that
performs canonical transformations on MIR. The goal is for MIR
pre-processed with mir-canon to show fewer differences than if it were not
pre-processed.

At the time of this writing we have a prototype canonicalization tool.
We’ve come up with some techniques that show promise and would like to open
discussion with the community to get feedback and suggestions on refining
said techniques. Currently we think of this as an open ended project.


Techniques:

Our prototype does the following for each basic block in a Reverse Post
Ordering:


   -

   Canonical instruction ordering is done by moving a given instruction as
   close to the nearest use of its def as possible.



   -

   Next, canonical VReg renaming is done by building a collection of
   candidate instructions that can be thought of as sinks in the def-use
   graph: they are typically instructions that write to physical registers or
   store to memory. These candidates are used as the root of a breadth first
   walk over the vreg operand def-use graph that determines a canonical vreg
   ordering.



   -

   Using said canonical vreg ordering we rename monotonically, but before
   we do this we skip several vreg values in order to increase the chance that
   we land on the same vreg number for two different input MIR files. We also
   do this to reduce the chances that a difference in previously def-use walks
   will affect the vreg renaming for subsequent walks. This skipping step
   could be thought of as a kind of vreg number reckoning: we skip modulo n
   vregs so that we are likely to land on the same vreg for two different
   files.



This approach is completely agnostic of ISA specific semantics so it should
work for any target.


Current status:

At the moment we have a standalone llvm tool that uses a single pass to do
the above described transformations. We have test inputs that show promise
but we still need a wider set of tests as well as hard metrics.

Our approach processes a single file at a time. The primary benefit to this
approach is lower complexity in initial implementation and exploration of
building such a tool. We are open to other approaches such as an llvm-diff
like (two file at a time) approach, but we have not explored that avenue
fully yet.

We’re eager to hear community feedback and will be ready to share patches
for review soon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170815/23dab9ee/attachment.html>

Puyan Lotfi via llvm-dev

2017-Aug-22 06:45 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Ping.

Still working on preparing code for review. Will have a patch for review
ready in the coming days.

PL

On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
wrote:
> Hi,
>
>
> My name is Puyan and I've been exploring ways to improve the state of
> instruction level diffing using llvm and MIR. Below is a proposal for a new
> llvm tool meant to address issues encountered when diffing at the machine
> level. I'm eager to hear the community's feedback.
>
>
> Thanks
>
>
> PL
>
>
> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>
>
> Problem Statement and Context:
>
> Diff-tools are regularly used for comparing IR and assembly files. This
> often involves reasoning through differences that are semantically
> equivalent and it can be very time consuming for a person to do said
> reasoning.
>
> Specifically in the context of GlobalISel development there are problems
> of correctness verification. There is a need to compare two programs,
> compiled from identical IR by two different instruction selectors in a way
> where the true differences stand out.
>
>
> Proposal:
>
> We propose a new tool that we have tentatively named 'mir-canon'
that
> performs canonical transformations on MIR. The goal is for MIR
> pre-processed with mir-canon to show fewer differences than if it were not
> pre-processed.
>
> At the time of this writing we have a prototype canonicalization tool.
> We’ve come up with some techniques that show promise and would like to open
> discussion with the community to get feedback and suggestions on refining
> said techniques. Currently we think of this as an open ended project.
>
>
> Techniques:
>
> Our prototype does the following for each basic block in a Reverse Post
> Ordering:
>
>
>    -
>
>    Canonical instruction ordering is done by moving a given instruction
>    as close to the nearest use of its def as possible.
>
>
>
>    -
>
>    Next, canonical VReg renaming is done by building a collection of
>    candidate instructions that can be thought of as sinks in the def-use
>    graph: they are typically instructions that write to physical registers
or
>    store to memory. These candidates are used as the root of a breadth
first
>    walk over the vreg operand def-use graph that determines a canonical
vreg
>    ordering.
>
>
>
>    -
>
>    Using said canonical vreg ordering we rename monotonically, but before
>    we do this we skip several vreg values in order to increase the chance
that
>    we land on the same vreg number for two different input MIR files. We
also
>    do this to reduce the chances that a difference in previously def-use
walks
>    will affect the vreg renaming for subsequent walks. This skipping step
>    could be thought of as a kind of vreg number reckoning: we skip modulo n
>    vregs so that we are likely to land on the same vreg for two different
>    files.
>
>
>
> This approach is completely agnostic of ISA specific semantics so it
> should work for any target.
>
>
> Current status:
>
> At the moment we have a standalone llvm tool that uses a single pass to do
> the above described transformations. We have test inputs that show promise
> but we still need a wider set of tests as well as hard metrics.
>
> Our approach processes a single file at a time. The primary benefit to
> this approach is lower complexity in initial implementation and exploration
> of building such a tool. We are open to other approaches such as an
> llvm-diff like (two file at a time) approach, but we have not explored that
> avenue fully yet.
>
> We’re eager to hear community feedback and will be ready to share patches
> for review soon.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/f547aabc/attachment.html>

Puyan Lotfi via llvm-dev

2017-Aug-22 21:09 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Patch for review.




On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
wrote:
> Ping.
>
> Still working on preparing code for review. Will have a patch for review
> ready in the coming days.
>
> PL
>
> On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
> wrote:
>
>> Hi,
>>
>>
>> My name is Puyan and I've been exploring ways to improve the state
of
>> instruction level diffing using llvm and MIR. Below is a proposal for a
new
>> llvm tool meant to address issues encountered when diffing at the
machine
>> level. I'm eager to hear the community's feedback.
>>
>>
>> Thanks
>>
>>
>> PL
>>
>>
>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>>
>>
>> Problem Statement and Context:
>>
>> Diff-tools are regularly used for comparing IR and assembly files. This
>> often involves reasoning through differences that are semantically
>> equivalent and it can be very time consuming for a person to do said
>> reasoning.
>>
>> Specifically in the context of GlobalISel development there are
problems
>> of correctness verification. There is a need to compare two programs,
>> compiled from identical IR by two different instruction selectors in a
way
>> where the true differences stand out.
>>
>>
>> Proposal:
>>
>> We propose a new tool that we have tentatively named
'mir-canon' that
>> performs canonical transformations on MIR. The goal is for MIR
>> pre-processed with mir-canon to show fewer differences than if it were
not
>> pre-processed.
>>
>> At the time of this writing we have a prototype canonicalization tool.
>> We’ve come up with some techniques that show promise and would like to
open
>> discussion with the community to get feedback and suggestions on
refining
>> said techniques. Currently we think of this as an open ended project.
>>
>>
>> Techniques:
>>
>> Our prototype does the following for each basic block in a Reverse Post
>> Ordering:
>>
>>
>>    -
>>
>>    Canonical instruction ordering is done by moving a given instruction
>>    as close to the nearest use of its def as possible.
>>
>>
>>
>>    -
>>
>>    Next, canonical VReg renaming is done by building a collection of
>>    candidate instructions that can be thought of as sinks in the
def-use
>>    graph: they are typically instructions that write to physical
registers or
>>    store to memory. These candidates are used as the root of a breadth
first
>>    walk over the vreg operand def-use graph that determines a canonical
vreg
>>    ordering.
>>
>>
>>
>>    -
>>
>>    Using said canonical vreg ordering we rename monotonically, but
>>    before we do this we skip several vreg values in order to increase
the
>>    chance that we land on the same vreg number for two different input
MIR
>>    files. We also do this to reduce the chances that a difference in
>>    previously def-use walks will affect the vreg renaming for
subsequent
>>    walks. This skipping step could be thought of as a kind of vreg
number
>>    reckoning: we skip modulo n vregs so that we are likely to land on
the same
>>    vreg for two different files.
>>
>>
>>
>> This approach is completely agnostic of ISA specific semantics so it
>> should work for any target.
>>
>>
>> Current status:
>>
>> At the moment we have a standalone llvm tool that uses a single pass to
>> do the above described transformations. We have test inputs that show
>> promise but we still need a wider set of tests as well as hard metrics.
>>
>> Our approach processes a single file at a time. The primary benefit to
>> this approach is lower complexity in initial implementation and
exploration
>> of building such a tool. We are open to other approaches such as an
>> llvm-diff like (two file at a time) approach, but we have not explored
that
>> avenue fully yet.
>>
>> We’re eager to hear community feedback and will be ready to share
patches
>> for review soon.
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/680e16a1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mir-canon.patch
Type: application/octet-stream
Size: 27540 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/680e16a1/attachment.obj>

Alex Bradbury via llvm-dev

2017-Aug-28 12:21 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

On 15 August 2017 at 20:06, Puyan Lotfi via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi,
>
>
> My name is Puyan and I've been exploring ways to improve the state of
> instruction level diffing using llvm and MIR. Below is a proposal for a new
> llvm tool meant to address issues encountered when diffing at the machine
> level. I'm eager to hear the community's feedback.
Hi Puyan. Seeing as nobody else has expressed an opinion so far, I
thought I'd just say that this does sound like a useful tool. I don't
currently do any MIR-level testing, but this seems like it would be
helpful when that changes.

Best,

Alex

Jessica Paquette via llvm-dev

2017-Aug-31 21:12 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Hey Puyan,

Do you think that mir-canon could be used with, say, the MachineOutliner/other
outlining/code size technologies?

If mir-canon is inter-procedural, I imagine that reducing the differences
between semantically equivalent sequences of instructions might reveal more
opportunities for outlining.

- Jessica
> On Aug 15, 2017, at 12:06 PM, Puyan Lotfi via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> My name is Puyan and I've been exploring ways to improve the state of
instruction level diffing using llvm and MIR. Below is a proposal for a new llvm
tool meant to address issues encountered when diffing at the machine level.
I'm eager to hear the community's feedback.
> 
> Thanks
> 
> PL
> 
> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
> 
> 
> Problem Statement and Context:
> 
> Diff-tools are regularly used for comparing IR and assembly files. This
often involves reasoning through differences that are semantically equivalent
and it can be very time consuming for a person to do said reasoning.
> 
> Specifically in the context of GlobalISel development there are problems of
correctness verification. There is a need to compare two programs, compiled from
identical IR by two different instruction selectors in a way where the true
differences stand out.
> 
> 
> Proposal:
> 
> We propose a new tool that we have tentatively named 'mir-canon'
that performs canonical transformations on MIR. The goal is for MIR
pre-processed with mir-canon to show fewer differences than if it were not
pre-processed.
> 
> At the time of this writing we have a prototype canonicalization tool.
We’ve come up with some techniques that show promise and would like to open
discussion with the community to get feedback and suggestions on refining said
techniques. Currently we think of this as an open ended project.
> 
> 
> Techniques:
> 
> Our prototype does the following for each basic block in a Reverse Post
Ordering:
> 
> Canonical instruction ordering is done by moving a given instruction as
close to the nearest use of its def as possible.
> 
> Next, canonical VReg renaming is done by building a collection of candidate
instructions that can be thought of as sinks in the def-use graph: they are
typically instructions that write to physical registers or store to memory.
These candidates are used as the root of a breadth first walk over the vreg
operand def-use graph that determines a canonical vreg ordering.
> 
> Using said canonical vreg ordering we rename monotonically, but before we
do this we skip several vreg values in order to increase the chance that we land
on the same vreg number for two different input MIR files. We also do this to
reduce the chances that a difference in previously def-use walks will affect the
vreg renaming for subsequent walks. This skipping step could be thought of as a
kind of vreg number reckoning: we skip modulo n vregs so that we are likely to
land on the same vreg for two different files.
> 
> 
> This approach is completely agnostic of ISA specific semantics so it should
work for any target.
> 
> 
> Current status:
> 
> At the moment we have a standalone llvm tool that uses a single pass to do
the above described transformations. We have test inputs that show promise but
we still need a wider set of tests as well as hard metrics.
> 
> Our approach processes a single file at a time. The primary benefit to this
approach is lower complexity in initial implementation and exploration of
building such a tool. We are open to other approaches such as an llvm-diff like
(two file at a time) approach, but we have not explored that avenue fully yet.
> 
> We’re eager to hear community feedback and will be ready to share patches
for review soon.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170831/c8617807/attachment.html>

Puyan Lotfi via llvm-dev

2017-Sep-06 15:11 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Right now the mir-cannon Canonicalization pass is limited within a basic
block. I don't know the full details of how the outliner works, but I
suspect if outlining success rate is limited by factors like vreg names
and/or instruction ordering then it could be helpful. Of course, I think if
you used some canonicalization pass to reorder the schedule prior to
outlining, then you'd definitely want to run the outlined code through the
scheduler.

PL

On Thu, Aug 31, 2017 at 5:12 PM, Jessica Paquette <jpaquette at apple.com>
wrote:
> Hey Puyan,
>
> Do you think that mir-canon could be used with, say, the
> MachineOutliner/other outlining/code size technologies?
>
> If mir-canon is inter-procedural, I imagine that reducing the differences
> between semantically equivalent sequences of instructions might reveal more
> opportunities for outlining.
>
> - Jessica
>
> On Aug 15, 2017, at 12:06 PM, Puyan Lotfi via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> My name is Puyan and I've been exploring ways to improve the state of
> instruction level diffing using llvm and MIR. Below is a proposal for a new
> llvm tool meant to address issues encountered when diffing at the machine
> level. I'm eager to hear the community's feedback.
>
> Thanks
>
> PL
>
> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>
>
> Problem Statement and Context:
>
> Diff-tools are regularly used for comparing IR and assembly files. This
> often involves reasoning through differences that are semantically
> equivalent and it can be very time consuming for a person to do said
> reasoning.
>
> Specifically in the context of GlobalISel development there are problems
> of correctness verification. There is a need to compare two programs,
> compiled from identical IR by two different instruction selectors in a way
> where the true differences stand out.
>
>
> Proposal:
>
> We propose a new tool that we have tentatively named 'mir-canon'
that
> performs canonical transformations on MIR. The goal is for MIR
> pre-processed with mir-canon to show fewer differences than if it were not
> pre-processed.
>
> At the time of this writing we have a prototype canonicalization tool.
> We’ve come up with some techniques that show promise and would like to open
> discussion with the community to get feedback and suggestions on refining
> said techniques. Currently we think of this as an open ended project.
>
>
> Techniques:
>
> Our prototype does the following for each basic block in a Reverse Post
> Ordering:
>
>
>    - Canonical instruction ordering is done by moving a given instruction
>    as close to the nearest use of its def as possible.
>
>
>
>    - Next, canonical VReg renaming is done by building a collection of
>    candidate instructions that can be thought of as sinks in the def-use
>    graph: they are typically instructions that write to physical registers
or
>    store to memory. These candidates are used as the root of a breadth
first
>    walk over the vreg operand def-use graph that determines a canonical
vreg
>    ordering.
>
>
>
>    - Using said canonical vreg ordering we rename monotonically, but
>    before we do this we skip several vreg values in order to increase the
>    chance that we land on the same vreg number for two different input MIR
>    files. We also do this to reduce the chances that a difference in
>    previously def-use walks will affect the vreg renaming for subsequent
>    walks. This skipping step could be thought of as a kind of vreg number
>    reckoning: we skip modulo n vregs so that we are likely to land on the
same
>    vreg for two different files.
>
>
>
> This approach is completely agnostic of ISA specific semantics so it
> should work for any target.
>
>
> Current status:
>
> At the moment we have a standalone llvm tool that uses a single pass to do
> the above described transformations. We have test inputs that show promise
> but we still need a wider set of tests as well as hard metrics.
>
> Our approach processes a single file at a time. The primary benefit to
> this approach is lower complexity in initial implementation and exploration
> of building such a tool. We are open to other approaches such as an
> llvm-diff like (two file at a time) approach, but we have not explored that
> avenue fully yet.
>
> We’re eager to hear community feedback and will be ready to share patches
> for review soon.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/f11d6e01/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Aug 2017 - [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Apparently Analagous Threads