thr3ads.net - llvm dev - [llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing. [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Puyan Lotfi via llvm-dev

2017-Aug-22 21:09 UTC

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Patch for review.




On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
wrote:
> Ping.
>
> Still working on preparing code for review. Will have a patch for review
> ready in the coming days.
>
> PL
>
> On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
> wrote:
>
>> Hi,
>>
>>
>> My name is Puyan and I've been exploring ways to improve the state
of
>> instruction level diffing using llvm and MIR. Below is a proposal for a
new
>> llvm tool meant to address issues encountered when diffing at the
machine
>> level. I'm eager to hear the community's feedback.
>>
>>
>> Thanks
>>
>>
>> PL
>>
>>
>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>>
>>
>> Problem Statement and Context:
>>
>> Diff-tools are regularly used for comparing IR and assembly files. This
>> often involves reasoning through differences that are semantically
>> equivalent and it can be very time consuming for a person to do said
>> reasoning.
>>
>> Specifically in the context of GlobalISel development there are
problems
>> of correctness verification. There is a need to compare two programs,
>> compiled from identical IR by two different instruction selectors in a
way
>> where the true differences stand out.
>>
>>
>> Proposal:
>>
>> We propose a new tool that we have tentatively named
'mir-canon' that
>> performs canonical transformations on MIR. The goal is for MIR
>> pre-processed with mir-canon to show fewer differences than if it were
not
>> pre-processed.
>>
>> At the time of this writing we have a prototype canonicalization tool.
>> We’ve come up with some techniques that show promise and would like to
open
>> discussion with the community to get feedback and suggestions on
refining
>> said techniques. Currently we think of this as an open ended project.
>>
>>
>> Techniques:
>>
>> Our prototype does the following for each basic block in a Reverse Post
>> Ordering:
>>
>>
>>    -
>>
>>    Canonical instruction ordering is done by moving a given instruction
>>    as close to the nearest use of its def as possible.
>>
>>
>>
>>    -
>>
>>    Next, canonical VReg renaming is done by building a collection of
>>    candidate instructions that can be thought of as sinks in the
def-use
>>    graph: they are typically instructions that write to physical
registers or
>>    store to memory. These candidates are used as the root of a breadth
first
>>    walk over the vreg operand def-use graph that determines a canonical
vreg
>>    ordering.
>>
>>
>>
>>    -
>>
>>    Using said canonical vreg ordering we rename monotonically, but
>>    before we do this we skip several vreg values in order to increase
the
>>    chance that we land on the same vreg number for two different input
MIR
>>    files. We also do this to reduce the chances that a difference in
>>    previously def-use walks will affect the vreg renaming for
subsequent
>>    walks. This skipping step could be thought of as a kind of vreg
number
>>    reckoning: we skip modulo n vregs so that we are likely to land on
the same
>>    vreg for two different files.
>>
>>
>>
>> This approach is completely agnostic of ISA specific semantics so it
>> should work for any target.
>>
>>
>> Current status:
>>
>> At the moment we have a standalone llvm tool that uses a single pass to
>> do the above described transformations. We have test inputs that show
>> promise but we still need a wider set of tests as well as hard metrics.
>>
>> Our approach processes a single file at a time. The primary benefit to
>> this approach is lower complexity in initial implementation and
exploration
>> of building such a tool. We are open to other approaches such as an
>> llvm-diff like (two file at a time) approach, but we have not explored
that
>> avenue fully yet.
>>
>> We’re eager to hear community feedback and will be ready to share
patches
>> for review soon.
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/680e16a1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mir-canon.patch
Type: application/octet-stream
Size: 27540 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170822/680e16a1/attachment.obj>

Justin Bogner via llvm-dev

2017-Aug-29 18:17 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

Puyan Lotfi via llvm-dev <llvm-dev at lists.llvm.org>
writes:> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>
> Problem Statement and Context:
>
> Diff-tools are regularly used for comparing IR and assembly files. This
often
> involves reasoning through differences that are semantically equivalent and
it
> can be very time consuming for a person to do said reasoning.
>
> Specifically in the context of GlobalISel development there are problems of
> correctness verification. There is a need to compare two programs, compiled
> from identical IR by two different instruction selectors in a way where the
> true differences stand out.
>
> Proposal:
>
> We propose a new tool that we have tentatively named 'mir-canon'
that performs
> canonical transformations on MIR. The goal is for MIR pre-processed with
> mir-canon to show fewer differences than if it were not pre-processed.
This sounds great. One place I'm hoping to use this for is in ISel
fuzzing - we can currently catch crashes and machine verifier errors,
but comparing SDAG to GlobalISel is a space that I'd really like to
explore.
> At the time of this writing we have a prototype canonicalization tool.
We’ve
> come up with some techniques that show promise and would like to open
> discussion with the community to get feedback and suggestions on refining
said
> techniques. Currently we think of this as an open ended project.
>
> Techniques:
>
> Our prototype does the following for each basic block in a Reverse Post
> Ordering:
>
>   * Canonical instruction ordering is done by moving a given instruction as
>     close to the nearest use of its def as possible.
>
>   * Next, canonical VReg renaming is done by building a collection of
>     candidate instructions that can be thought of as sinks in the def-use
>     graph: they are typically instructions that write to physical registers
or
>     store to memory. These candidates are used as the root of a breadth
first
>     walk over the vreg operand def-use graph that determines a canonical
vreg
>     ordering.
>
>   * Using said canonical vreg ordering we rename monotonically, but before
we
>     do this we skip several vreg values in order to increase the chance
that
>     we land on the same vreg number for two different input MIR files.
This is one place where handling multiple files at once would help, correct?
>     We also do this to reduce the chances that a difference in
>     previously def-use walks will affect the vreg renaming for
>     subsequent walks. This skipping step could be thought of as a kind
>     of vreg number reckoning: we skip modulo n vregs so that we are
>     likely to land on the same vreg for two different files.
>
> This approach is completely agnostic of ISA specific semantics so it should
> work for any target.
>
> Current status:
>
> At the moment we have a standalone llvm tool that uses a single pass to do
> the above described transformations. We have test inputs that show promise
> but we still need a wider set of tests as well as hard metrics.
This sounds like a great start, but I feel like this design won't quite
scale to all the ways we want to use this.

Consider, if this was a pass in the LLVM library rather than only
accessible in a tool, you could potentially set it up to run at the end
of an `llc` run. Similarly, if I want to use this in a fuzzer it would
be nice to be able to do the canonicalization in memory.
> Our approach processes a single file at a time. The primary benefit to this
> approach is lower complexity in initial implementation and exploration of
> building such a tool. We are open to other approaches such as an llvm-diff
> like (two file at a time) approach, but we have not explored that avenue
> fully yet.
>
> We’re eager to hear community feedback and will be ready to share patches
> for review soon.
 ...> Patch for review.
Please send the patch to llvm-commits (and refer back to this thread).
Patch review is too much traffic for llvm-dev.
> diff --git a/tools/mir-canon/CMakeLists.txt
b/tools/mir-canon/CMakeLists.txt
> new file mode 100644
> index 00000000000..eb6722e4bd4
> --- /dev/null
> +++ b/tools/mir-canon/CMakeLists.txt
> @@ -0,0 +1,28 @@
> +set(LLVM_LINK_COMPONENTS
> +  ${LLVM_TARGETS_TO_BUILD}
> +  Analysis
> +  AsmPrinter
> +  CodeGen
> +  Core
> +  IRReader
> +  MC
> +  MIRParser
> +  ScalarOpts
> +  SelectionDAG
> +  Support
> +  Target
> +  TransformUtils
> +  Vectorize
> +  )
> +
> +# Support plugins.
> +set(LLVM_NO_DEAD_STRIP 1)
> +
> +add_llvm_tool(mir-canon
> +  mir-canon.cpp
> +  MIRCanonicalizerPass.cpp
> +
> +  DEPENDS
> +  intrinsics_gen
> +  )
> +export_executable_symbols(mir-canon)
> diff --git a/tools/mir-canon/Common.h b/tools/mir-canon/Common.h
> new file mode 100644
> index 00000000000..3e6be7ef643
> --- /dev/null
> +++ b/tools/mir-canon/Common.h
> @@ -0,0 +1,21 @@
> +#include <cassert>
> +
> +namespace llvm {
> +/// The purpose of this pass is to establish a canonical operand naming so
> +/// that code compiled with slightly different IR passes can be diffed
more
> +/// effectively than otherwise. This is done by renaming vregs in a given
> +/// LiveRange in a canonical way.
> +extern char &MIRCanonicalizerID;
> +
> +void initializeMIRCanonicalizerPass(PassRegistry &);
> +
> +} // namespace llvm
> +
> +#define CANON_CHECK(CHK, MSG) \
> +  do { \
> +    if ((CHK)) break; \
> +    errs() << (std::string("") + MSG) <<
'\n'; \
> +    assert(false && "CANON_CHECK FAILURE."); \
> +    exit(EXIT_FAILURE); \
> +  } while (false)
> +
> diff --git a/tools/mir-canon/LLVMBuild.txt b/tools/mir-canon/LLVMBuild.txt
> new file mode 100644
> index 00000000000..5d3e6bb5a91
> --- /dev/null
> +++ b/tools/mir-canon/LLVMBuild.txt
> @@ -0,0 +1,22 @@
> +;===- ./tools/mir-canon/LLVMBuild.txt --------------------------*- Conf
-*--===;
> +;
> +;                     The LLVM Compiler Infrastructure
> +;
> +; This file is distributed under the University of Illinois Open Source
> +; License. See LICENSE.TXT for details.
> +;
>
+;===------------------------------------------------------------------------===;
> +;
> +; This is an LLVMBuild description file for the components in this
subdirectory.
> +;
> +; For more information on the LLVMBuild system, please see:
> +;
> +;   http://llvm.org/docs/LLVMBuild.html
> +;
>
+;===------------------------------------------------------------------------===;
> +
> +[component_0]
> +type = Tool
> +name = mir-canon
> +parent = Tools
> +required_libraries = AsmParser BitReader IRReader MIRParser TransformUtils
Scalar Vectorize all-targets
> diff --git a/tools/mir-canon/MIRCanonicalizerPass.cpp
b/tools/mir-canon/MIRCanonicalizerPass.cpp
> new file mode 100644
> index 00000000000..a4a0a7a0e5f
> --- /dev/null
> +++ b/tools/mir-canon/MIRCanonicalizerPass.cpp
> @@ -0,0 +1,588 @@
> +//===-------------- MIRCanonicalizer.cpp - MIR Canonicalizer
--------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
>
+//===----------------------------------------------------------------------===//
> +//
> +// Reorders instructions canonically.
> +// Renames virtual register operands canonically.
> +// Strips certain MIR artifacts (optionally).
> +//
>
+//===----------------------------------------------------------------------===//
> +
> +#include "llvm/CodeGen/Passes.h"
> +#include "llvm/CodeGen/MachineInstrBuilder.h"
> +#include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MachineRegisterInfo.h"
> +
> +#include "llvm/Support/raw_ostream.h"
> +#include "llvm/ADT/PostOrderIterator.h"
> +#include "llvm/Target/TargetInstrInfo.h"
> +
> +#include <tuple>
> +#include <queue>
> +
> +using namespace llvm;
> +
> +#include "Common.h"
> +
> +#define DEBUG_TYPE "mir-canonicalizer"
> +
> +static cl::opt<unsigned>
> +CanonicalizeFunctionNumber("canon-nth-function", cl::Hidden,
cl::init(~0u),
> +                           cl::value_desc("N"),
> +                           cl::desc("Function number to
canonicalize."));
> +
> +static cl::opt<unsigned>
> +CanonicalizeBasicBlockNumber("canon-nth-basicblock", cl::Hidden,
cl::init(~0u),
> +                             cl::value_desc("N"),
> +                             cl::desc("BasicBlock number to
canonicalize."));
> +
> +namespace {
> +
> +class MIRCanonicalizer : public MachineFunctionPass {
> +public:
> +  static char ID;
> +  MIRCanonicalizer() : MachineFunctionPass(ID) {}
> +
> +  StringRef getPassName() const override {
> +    return "Rename register operands in a canonical ordering.";
> +  }
> +
> +  void getAnalysisUsage(AnalysisUsage &AU) const override {
> +    AU.setPreservesCFG();
> +    MachineFunctionPass::getAnalysisUsage(AU);
> +  }
> +
> +  bool runOnMachineFunction(MachineFunction &MF) override;
> +};
> +
> +} // end anonymous namespace
> +
> +enum RSType { RSE_Reg = 0, RSE_FrameIndex, RSE_NewCandidate };
> +typedef std::tuple<RSType, unsigned> typedRegStackEntry;
> +
> +char MIRCanonicalizer::ID;
> +
> +char &llvm::MIRCanonicalizerID = MIRCanonicalizer::ID;
> +
> +INITIALIZE_PASS_BEGIN(MIRCanonicalizer, "mir-canonicalizer",
> +                      "Rename Register Operands Canonically",
false, false);
> +
> +INITIALIZE_PASS_END(MIRCanonicalizer, "mir-canonicalizer",
> +                    "Rename Register Operands Canonically",
false, false);
> +
> +static std::vector<MachineBasicBlock *> GetRPOList(MachineFunction
&MF) {
> +  MachineBasicBlock *Entry = &*MF.begin();
> +  std::vector<MachineBasicBlock*> RPOList;
> +  ReversePostOrderTraversal<MachineBasicBlock*> RPOT(Entry);
> +  for (ReversePostOrderTraversal<MachineBasicBlock*>::rpo_iterator
> +       MBBI = RPOT.begin(), MBBE = RPOT.end(); MBBI != MBBE; ++MBBI) {
> +    MachineBasicBlock *MBB = *MBBI;
> +    RPOList.push_back(MBB);
> +  }
> +
> +  return RPOList;
> +}
> +
> +// Set a dummy vreg. We use this vregs register class to generate
throw-away
> +// vregs that are used to skip vreg numbers so that vreg numbers line up.
> +static unsigned GetDummyVReg(const MachineFunction &MF) {
> +  for (auto &MBB : MF) {
> +    for (auto II = MBB.begin(), IE = MBB.end(); II != IE; ++II) {
> +      const MachineInstr *MI = &*II;
> +      for (unsigned i = 0; i < MI->getNumOperands(); i++) {
> +        const MachineOperand &MO = MI->getOperand(i);
> +        if (!MO.isReg() ||
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
> +          continue;
> +        return MO.getReg();
> +      }
> +    }
> +  }
> +
> +  return ~0U;
> +}
> +
> +static bool rescheduleCanoncally(MachineBasicBlock *MBB) {
> +
> +  bool Changed = false;
> +
> +  // Calculates the distance of MI from the begining of its parent BB.
> +  auto getInstrIdx = [](const MachineInstr &MI) {
> +    unsigned i = 0;
> +    for (auto &I : *MI.getParent()) {
> +      if (&I == &MI)
> +        return i;
> +      i++;
> +    }
> +    return ~0U;
> +  };
> +
> +  // Pre-Populate vector of instructions to reschedule so that we
don't
> +  // clobber the iterator.
> +  std::vector<MachineInstr *> instructions;
> +  for (auto II = MBB->begin(); II != MBB->end(); ++II) {
> +    instructions.push_back(&*II);
> +  }
> +
> +  for (auto II : instructions) {
> +    if (II->getNumOperands() == 0)
> +      continue;
> +
> +    MachineOperand &MO = II->getOperand(0);
> +    if (!MO.isReg() ||
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
> +      continue;
> +
> +    DEBUG(dbgs() << "Operand " << 0 << "
of "; II->dump(); MO.dump(););
> +
> +    MachineInstr *def = II;
> +    unsigned distance = ~0U;
> +    MachineInstr *useToBringDefCloserTo = nullptr;
> +    MachineRegisterInfo *MRI = &MBB->getParent()->getRegInfo();
> +    for (auto UI = MRI->use_begin(MO.getReg()), UE = MRI->use_end();
> +         UI != UE; ++UI) {
> +      MachineInstr *useInst = UI->getParent();
> +
> +      const unsigned defLoc = getInstrIdx(*def);
> +      const unsigned useLoc = getInstrIdx(*useInst);
> +      const unsigned delta = (useLoc - defLoc);
> +
> +      if (useInst->getParent() != def->getParent())
> +        continue;
> +      if (defLoc >= useLoc)
> +        continue;
> +
> +      if (delta < distance) {
> +        distance = delta;
> +        useToBringDefCloserTo = useInst;
> +      }
> +    }
> +
> +    MachineBasicBlock::iterator defI = MBB->instr_end();
> +    MachineBasicBlock::iterator useI = MBB->instr_end();
> +
> +    for (auto BBI = MBB->instr_begin(); BBI != MBB->instr_end();
++BBI) {
> +
> +      if (defI != MBB->instr_end() && useI !=
MBB->instr_end())
> +        break;
> +
> +      if ((&*BBI != def) && (&*BBI !=
useToBringDefCloserTo))
> +        continue;
> +
> +      if (&*BBI == def) {
> +        defI = BBI;
> +        continue;
> +      }
> +
> +      if (&*BBI == useToBringDefCloserTo) {
> +        useI = BBI;
> +        continue;
> +      }
> +    }
> +
> +    if (defI == MBB->instr_end() || useI == MBB->instr_end())
> +      continue;
> +
> +    DEBUG(dbgs() << "Splicing "; defI->dump(); dbgs()
<< " right before: ";
> +          useI->dump(););
> +
> +    Changed = true;
> +    MBB->splice(useI, MBB, defI);
> +  }
> +
> +  return Changed;
> +}
> +
> +static std::vector<MachineInstr *>
populateCandidates(MachineBasicBlock *MBB) {
> +  std::vector<MachineInstr *> candidates;
> +  MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();
> +
> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II) {
> +    MachineInstr *MI = &*II;
> +
> +    bool doesMISideEffect = false;
> +
> +    if (MI->getOperand(0).isReg()) {
> +      const unsigned dst = MI->getOperand(0).getReg();
> +      doesMISideEffect |= !TargetRegisterInfo::isVirtualRegister(dst);
> +
> +      for (auto UI = MRI.use_begin(dst); UI != MRI.use_end(); ++UI) {
> +        if (doesMISideEffect) break;
> +        doesMISideEffect |= (UI->getParent()->getParent() !=
MI->getParent());
> +      }
> +    }
> +
> +    if (!MI->mayStore() && !MI->isBranch() &&
!doesMISideEffect)
> +      continue;
> +
> +    DEBUG(dbgs() << "Found Candidate:  "; MI->dump(););
> +    candidates.push_back(MI);
> +  }
> +
> +  return candidates;
> +}
> +
> +void doCandidateWalk(std::vector<typedRegStackEntry> &VRegs,
> +                     std::queue <typedRegStackEntry> &reg_queue,
> +                     std::vector<MachineInstr *> &visitedMIs,
> +                     const MachineBasicBlock *MBB) {
> +
> +  const MachineFunction &MF = *MBB->getParent();
> +  const MachineRegisterInfo &MRI = MF.getRegInfo();
> +
> +  while (reg_queue.size()) {
> +    auto regType = std::get<0>(reg_queue.front());
> +    auto reg = std::get<1>(reg_queue.front());
> +    reg_queue.pop();
> +
> +    if (regType == RSE_FrameIndex) {
> +      DEBUG(dbgs() << "Popping frame index.\n";);
> +      VRegs.push_back(std::make_tuple(RSE_FrameIndex, ~0U));
> +      continue;
> +    }
> +
> +    assert(regType == RSE_Reg && "Expected vreg or
physreg.");
> +
> +    if (TargetRegisterInfo::isVirtualRegister(reg)) {
> +      DEBUG(dbgs() << "Popping vreg ";
> +          MRI.def_begin(reg)->dump();
> +          dbgs() << "\n";);
> +
> +      bool hasVisitedVReg = false;
> +      for (auto vreg : VRegs) {
> +        if (std::get<1>(vreg) == reg) {
> +          hasVisitedVReg = true;
> +          break;
> +        }
> +      }
> +
> +      if (!hasVisitedVReg)
> +        VRegs.push_back(std::make_tuple(RSE_Reg, reg));
> +    } else {
> +      DEBUG(dbgs() << "Popping physreg.\n";);
> +      VRegs.push_back(std::make_tuple(RSE_Reg, reg));
> +      continue;
> +    }
> +
> +    for (auto RI = MRI.def_begin(reg), RE = MRI.def_end(); RI != RE; ++RI)
{
> +      MachineInstr *def = RI->getParent();
> +
> +      if (def->getParent() != MBB)
> +        continue;
> +
> +      bool hasVisited = false;
> +      for (auto VMI : visitedMIs) {
> +        if (def == VMI) {
> +          hasVisited = true;
> +          break;
> +        }
> +      }
> +
> +      if (hasVisited)
> +        break;
> +
> +      DEBUG(dbgs() << "\n========================\n";
> +          dbgs() << "Visited MI: "; def->dump();
> +          dbgs() << "BB Name: " <<
def->getParent()->getName() << "\n";
> +          dbgs() << "\n========================\n";);
> +      visitedMIs.push_back(def);
> +      for (unsigned I = 1, E = def->getNumOperands(); I != E; ++I) {
> +
> +        MachineOperand &MO = def->getOperand(I);
> +        if (MO.isFI()) {
> +          DEBUG(dbgs() << "Pushing frame index.\n";);
> +          reg_queue.push(std::make_tuple(RSE_FrameIndex, ~0U));
> +        }
> +
> +        if (!MO.isReg())
> +          continue;
> +        reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
> +      }
> +    }
> +  }
> +}
> +
> +static void SkipVRegs(unsigned &VRegGapIndex, MachineRegisterInfo
&MRI,
> +                      const TargetRegisterClass *RC) {
> +  const unsigned VR_GAP = (++VRegGapIndex * 1000);
> +
> +  DEBUG(dbgs() << "Adjusting per-BB VR_GAP for BB"
<< VRegGapIndex << " to "
> +               << VR_GAP << "\n";);
> +
> +  unsigned I = MRI.createVirtualRegister(RC);
> +  const unsigned E = (((I + VR_GAP) / VR_GAP) + 1) * VR_GAP;
> +  while (I != E) {
> +    I = MRI.createVirtualRegister(RC);
> +  }
> +
> +  CANON_CHECK(I != ~0U && I != 0, "Invalid VReg");
> +}
> +
> +static void GetVRegRenameMap(std::map<unsigned, unsigned>
&vregRenameMap,
> +                             const std::vector<typedRegStackEntry>
&VRegs,
> +                             const std::vector<unsigned>
&renamedInOtherBB,
> +                             MachineRegisterInfo &MRI,
> +                             const TargetRegisterClass *RC) {
> +
> +  unsigned lastRenameReg = MRI.createVirtualRegister(RC);
> +
> +  bool firstCandidate = true;
> +  for (auto vreg : VRegs) {
> +
> +    auto regType = std::get<0>(vreg);
> +    auto reg = std::get<1>(vreg);
> +
> +    if (regType == RSE_FrameIndex) {
> +      lastRenameReg = MRI.createVirtualRegister(RC);
> +      DEBUG(dbgs() << "Skipping rename for FI " <<
lastRenameReg << "\n";);
> +      continue;
> +    } else if (regType == RSE_NewCandidate) {
> +      if (!firstCandidate) {
> +        while (lastRenameReg % 10) {
> +          lastRenameReg = MRI.createVirtualRegister(RC);
> +
> +          DEBUG(dbgs() << "Skipping rename for new candidate
" << lastRenameReg
> +                       << "\n";);
> +        }
> +      }
> +      firstCandidate = false;
> +      continue;
> +    } else if (!TargetRegisterInfo::isVirtualRegister(reg)) {
> +      lastRenameReg = MRI.createVirtualRegister(RC);
> +      DEBUG(dbgs() << "Skipping rename for Phys Reg "
<< lastRenameReg << "\n";);
> +      continue;
> +    }
> +
> +    if (std::find(renamedInOtherBB.begin(), renamedInOtherBB.end(), reg)
!> +        renamedInOtherBB.end()) {
> +      DEBUG(dbgs() << "Vreg " << reg << "
already renamed in other BB.\n";);
> +      continue;
> +    }
> +
> +    auto rename = MRI.createVirtualRegister(MRI.getRegClass(reg));
> +    lastRenameReg = rename;
> +
> +    if (vregRenameMap.find(reg) == vregRenameMap.end()) {
> +      DEBUG(dbgs() << "Mapping vreg ";);
> +      if (MRI.reg_begin(reg) != MRI.reg_end()) {
> +        DEBUG(auto foo = &*MRI.reg_begin(reg); foo->dump(););
> +      } else {
> +        DEBUG(dbgs() << reg;);
> +      }
> +      DEBUG(dbgs() << " to ";);
> +      if (MRI.reg_begin(rename) != MRI.reg_end()) {
> +        DEBUG(auto foo = &*MRI.reg_begin(rename); foo->dump(););
> +      } else {
> +        DEBUG(dbgs() << rename;);
> +      }
> +      DEBUG(dbgs() << "\n";);
> +
> +      vregRenameMap.insert(std::pair<unsigned, unsigned>(reg,
rename));
> +    }
> +  }
> +}
> +
> +static bool doVRegRenaming(std::vector<unsigned>
&renamedInOtherBB,
> +                           const std::map<unsigned, unsigned>
&vregRenameMap,
> +                           MachineRegisterInfo &MRI) {
> +  bool Changed = false;
> +  for (auto I = vregRenameMap.begin(), E = vregRenameMap.end(); I != E;
++I) {
> +
> +    auto vreg = I->first;
> +    auto rename = I->second;
> +
> +    renamedInOtherBB.push_back(rename);
> +
> +    std::vector<MachineOperand *> renameMOs;
> +    for (MachineOperand &MO : MRI.reg_operands(vreg)) {
> +      renameMOs.push_back(&MO);
> +    }
> +
> +    for (auto MO : renameMOs) {
> +      Changed = true;
> +      MO->setReg(rename);
> +
> +      if (!MO->isDef())
> +        MO->setIsKill(false);
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +static bool doDefKillClear(MachineBasicBlock *MBB) {
> +  bool Changed = false;
> +
> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II) {
> +    MachineInstr *MI = &*II;
> +
> +    for (unsigned i = 0; i < MI->getNumOperands(); i++) {
> +      MachineOperand &MO = MI->getOperand(i);
> +      if (!MO.isReg())
> +        continue;
> +      if (!MO.isDef() && MO.isKill()) {
> +        Changed = true;
> +        MO.setIsKill(false);
> +      }
> +
> +      if (MO.isDef() && MO.isDead()) {
> +        Changed = true;
> +        MO.setIsDead(false);
> +      }
> +    }
> +  }
> +
> +  return Changed;
> +}
> +
> +static bool runOnBasicBlock(MachineBasicBlock *MBB,
> +                            std::vector<StringRef> &bbNames,
> +                            std::vector<unsigned>
&renamedInOtherBB,
> +                            unsigned &basicBlockNum, unsigned
&VRegGapIndex) {
> +
> +  if (CanonicalizeBasicBlockNumber != ~0U) {
> +    if (CanonicalizeBasicBlockNumber != basicBlockNum++)
> +      return false;
> +    DEBUG(dbgs() << "\n Canonicalizing BasicBlock "
<< MBB->getName() << "\n";);
> +  }
> +
> +  if (std::find(bbNames.begin(), bbNames.end(), MBB->getName()) !> +
bbNames.end()) {
> +    DEBUG(dbgs() << "Found potentially duplicate BasicBlocks:
"
> +                 << MBB->getName() << "\n";);
> +    return false;
> +  }
> +
> +  DEBUG(
> +    dbgs() << "\n\n  NEW BASIC BLOCK: " <<
MBB->getName() << "  \n\n";
> +    dbgs() <<
"\n\n================================================\n\n";
> +  );
> +
> +  bool Changed = false;
> +  MachineFunction &MF = *MBB->getParent();
> +  MachineRegisterInfo &MRI = MF.getRegInfo();
> +
> +  const TargetRegisterClass *dummyRC = [] (const MachineFunction &MF)
{
> +    const unsigned dummyVReg = GetDummyVReg(MF);
> +    CANON_CHECK(dummyVReg != ~0U, "Could not find a dummy
VReg.");
> +    const MachineRegisterInfo &MRI = MF.getRegInfo();
> +    return MRI.getRegClass(dummyVReg);
> +  } (MF);
> +
> +  bbNames.push_back(MBB->getName());
> +  DEBUG(dbgs() << "\n\n NEW BASIC BLOCK: " <<
MBB->getName() << "\n\n";);
> +
> +  DEBUG(dbgs() << "MBB Before Scheduling:\n";
MBB->dump(););
> +  Changed |= rescheduleCanoncally(MBB);
> +  DEBUG(dbgs() << "MBB After Scheduling:\n";
MBB->dump(););
> +
> +  std::vector<MachineInstr *> candidates = populateCandidates(MBB);
> +  std::vector<MachineInstr *> visitedMIs;
> +  std::copy(candidates.begin(), candidates.end(),
> +            std::back_inserter(visitedMIs));
> +
> +  std::vector<typedRegStackEntry> VRegs;
> +  for (auto candidate : candidates) {
> +    VRegs.push_back(std::make_tuple(RSE_NewCandidate, ~0U));
> +
> +    std::queue<typedRegStackEntry> reg_queue;
> +
> +    // Here we walk the vreg operands of a non-root node along our walk.
> +    // The root nodes are the original candidates (stores normally).
> +    // These are normally not the root nodes (except for the case of
copies to
> +    // physical registers).
> +    for (unsigned i = 1; i < candidate->getNumOperands(); i++) {
> +      if (candidate->mayStore() || candidate->isBranch())
> +        break;
> +
> +      MachineOperand &MO = candidate->getOperand(i);
> +      if (!(MO.isReg() &&
TargetRegisterInfo::isVirtualRegister(MO.getReg())))
> +        continue;
> +
> +      DEBUG(dbgs() << "Enqueue register"; MO.dump();
dbgs() << "\n";);
> +      reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
> +    }
> +
> +    // Here we walk the root candidates. We start from the 0th operand
because
> +    // the root is normally a store to a vreg.
> +    for (unsigned i = 0; i < candidate->getNumOperands(); i++) {
> +
> +      if (!candidate->mayStore() && !candidate->isBranch())
> +        break;
> +
> +      MachineOperand &MO = candidate->getOperand(i);
> +
> +      // TODO: Do we want to only add vregs here?
> +      if (!MO.isReg() && !MO.isFI())
> +        continue;
> +
> +      DEBUG(dbgs() << "Enqueue Reg/FI"; MO.dump(); dbgs()
<< "\n";);
> +
> +      reg_queue.push(MO.isReg() ? std::make_tuple(RSE_Reg, MO.getReg()) :
> +                                  std::make_tuple(RSE_FrameIndex,  ~0U));
> +    }
> +
> +    // Now that we have pushed the operands of the candidate here, we do
the
> +    // full breadth first walk.
> +    doCandidateWalk(VRegs, reg_queue, visitedMIs, MBB);
> +  }
> +
> +  // If we have populated no vregs to rename then bail.
> +  // The rest of this function does the vreg remaping.
> +  if (VRegs.size() == 0)
> +    return Changed;
> +
> +  // Skip some vregs, so we can recon where we'll land next.
> +  SkipVRegs(VRegGapIndex, MRI, dummyRC);
> +
> +  std::map<unsigned, unsigned> vregRenameMap;
> +  GetVRegRenameMap(vregRenameMap, VRegs, renamedInOtherBB, MRI, dummyRC);
> +
> +  Changed |= doVRegRenaming(renamedInOtherBB, vregRenameMap, MRI);
> +  Changed |= doDefKillClear(MBB);
> +
> +  DEBUG(dbgs() << "Updated MachineBasicBlock:\n";
MBB->dump(); dbgs() << "\n";);
> +  DEBUG(dbgs() <<
"\n\n================================================\n\n");
> +  return Changed;
> +}
> +
> +bool MIRCanonicalizer::runOnMachineFunction(MachineFunction &MF) {
> +
> +  static unsigned functionNum = 0;
> +  if (CanonicalizeFunctionNumber != ~0U) {
> +    if (CanonicalizeFunctionNumber != functionNum++)
> +      return false;
> +    DEBUG(dbgs() << "\n Canonicalizing Function " <<
MF.getName() << "\n";);
> +  }
> +
> +  // we need a valid vreg to create a vreg type for skipping all those
> +  // stray vreg numbers so reach alignment/canonical vreg values.
> +  std::vector<MachineBasicBlock*> RPOList = GetRPOList(MF);
> +
> +  DEBUG(
> +    dbgs() << "\n\n  NEW MACHINE FUNCTION: " <<
MF.getName() << "  \n\n";
> +    dbgs() <<
"\n\n================================================\n\n";
> +    dbgs() << "Total Basic Blocks: " <<
RPOList.size() << "\n";
> +    for (auto MBB : RPOList) {
> +      dbgs() << MBB->getName() << "\n";
> +    }
> +    dbgs() <<
"\n\n================================================\n\n";
> +  );
> +
> +  std::vector<StringRef> bbNames;
> +  std::vector<unsigned> renamedInOtherBB;
> +
> +  unsigned gapIdx = 0;
> +  unsigned bbNum = 0;
> +
> +  bool Changed = false;
> +
> +  for (auto MBB : RPOList)
> +    Changed |= runOnBasicBlock(MBB, bbNames, renamedInOtherBB, bbNum,
gapIdx);
> +
> +  return Changed;
> +}
> +
> diff --git a/tools/mir-canon/mir-canon.cpp b/tools/mir-canon/mir-canon.cpp
> new file mode 100644
> index 00000000000..bfe53c01906
> --- /dev/null
> +++ b/tools/mir-canon/mir-canon.cpp
> @@ -0,0 +1,175 @@
> +//===------ mir-canon.cpp - Implement the LLVM Native Code Generator
------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
>
+//===----------------------------------------------------------------------===//
> +//
> +// This is the mir-canon driver. It invokes the MIR canonicalization pass.
> +//
>
+//===----------------------------------------------------------------------===//
> +
> +#include "llvm/CodeGen/CommandFlags.h"
> +#include "llvm/CodeGen/TargetPassConfig.h"
> +#include "llvm/CodeGen/MachineModuleInfo.h"
> +#include "llvm/CodeGen/MachineFunctionPass.h"
> +#include "llvm/CodeGen/MIRParser/MIRParser.h"
> +
> +#include "llvm/IR/LLVMContext.h"
> +#include "llvm/IR/DiagnosticInfo.h"
> +#include "llvm/IR/IRPrintingPasses.h"
> +#include "llvm/IR/DiagnosticPrinter.h"
> +#include "llvm/IR/LegacyPassManager.h"
> +
> +#include "llvm/Support/Signals.h"
> +#include "llvm/Support/FileSystem.h"
> +#include "llvm/Support/TargetSelect.h"
> +#include "llvm/Support/TargetRegistry.h"
> +#include "llvm/Support/ToolOutputFile.h"
> +#include "llvm/Support/FormattedStream.h"
> +#include "llvm/Support/PrettyStackTrace.h"
> +
> +#include "llvm/Target/TargetMachine.h"
> +#include "llvm/Target/TargetSubtargetInfo.h"
> +
> +#include "Common.h"
> +
> +using namespace llvm;
> +
> +static cl::opt<std::string>
> +    InputFilename(cl::Positional, cl::desc("<input.mir>"),
cl::init("-"));
> +
> +static cl::opt<std::string> OutputFilename("o",
cl::desc("Output filename"),
> +                                          
cl::value_desc("canon.mir"));
> +
> +static cl::opt<std::string> TargetTriple("mtriple",
> +                                         cl::desc("Override target
triple."));
> +
> +int main(int argc, char **argv) {
> +  sys::PrintStackTraceOnErrorSignal(argv[0]);
> +  PrettyStackTraceProgram X(argc, argv);
> +
> +  EnableDebugBuffering = true;
> +
> +  LLVMContext Context;
> +  llvm_shutdown_obj Y;
> +
> +  // Initialize targets first, so that --version shows registered targets.
> +  InitializeAllTargets();
> +  InitializeAllTargetMCs();
> +  InitializeAllAsmPrinters();
> +  InitializeAllAsmParsers();
> +
> +  PassRegistry *Registry = PassRegistry::getPassRegistry();
> +  initializeCore(*Registry);
> +  initializeCodeGen(*Registry);
> +
> +  initializeMIRCanonicalizerPass(*Registry);
> +
> +  // Register the target printer for --version.
> + 
cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);
> +
> +  cl::ParseCommandLineOptions(argc, argv, "llvm system
compiler\n");
> +
> +  CANON_CHECK(StringRef(InputFilename).endswith_lower(".mir"),
> +              "Invalid Filename: " + InputFilename + ".
" +
> +              "Expected a .mir file extension.");
> +
> +  // Set a diagnostic handler that doesn't exit on the first error
> +  bool HasError = false;
> +  Context.setDiagnosticHandler(
> +    [](const DiagnosticInfo &DI, void *Context) {
> +      if (DI.getSeverity() == DS_Error) {
> +        bool *HasError = static_cast<bool *>(Context);
> +        *HasError = true;
> +      }
> +
> +      auto *Remark =
dyn_cast<DiagnosticInfoOptimizationBase>(&DI);
> +      if (Remark && !Remark->isEnabled())
> +        return;
> +
> +      DiagnosticPrinterRawOStream DP(errs());
> +      errs() <<
LLVMContext::getDiagnosticMessagePrefix(DI.getSeverity());
> +      errs() << ": ";
> +      DI.print(DP);
> +      errs() << "\n";
> +    },
> +    &HasError);
> +
> +  SMDiagnostic Err;
> +  auto MIR = createMIRParserFromFile(InputFilename, Err, Context);
> +  auto M = MIR ? MIR->parseIRModule() : nullptr;
> +
> +  CANON_CHECK(M, argv[0] + "Invalid Module or MIR.");
> +
> +  if (!TargetTriple.empty())
> +    M->setTargetTriple(Triple::normalize(TargetTriple));
> +
> +  Triple TheTriple = Triple(M->getTargetTriple());
> +  if (TheTriple.getTriple().empty())
> +    TheTriple.setTriple(sys::getDefaultTargetTriple());
> +
> +  // Get the target specific parser.
> +  std::string Error;
> +  const Target *TheTarget > +      TargetRegistry::lookupTarget(MArch,
TheTriple, Error);
> +
> +  CANON_CHECK(TheTarget, argv[0] + ": " + Error);
> +
> +  std::string CPUStr = getCPUStr();
> +  std::string FeaturesStr = getFeaturesStr();
> +
> +  std::unique_ptr<TargetMachine>
Target(TheTarget->createTargetMachine(
> +      TheTriple.getTriple(), CPUStr, FeaturesStr,
> +      InitTargetOptionsFromCodeGenFlags(), getRelocModel(),
getCodeModel(),
> +      CodeGenOpt::Default));
> +
> +  CANON_CHECK(Target, "Could not allocate target machine!");
> +
> +  std::error_code EC;
> +  auto Out = llvm::make_unique<tool_output_file>(OutputFilename, EC,
> +                                                 sys::fs::F_Text);
> +  CANON_CHECK(!EC, EC.message());
> +  raw_pwrite_stream *OS = &Out->os();
> +
> +  legacy::PassManager PM;
> +  TargetLibraryInfoImpl TLII(Triple(M->getTargetTriple()));
> +  PM.add(new TargetLibraryInfoWrapperPass(TLII));
> +  M->setDataLayout(Target->createDataLayout());
> +  setFunctionAttributes(CPUStr, FeaturesStr, *M);
> +
> +  LLVMTargetMachine &TM = static_cast<LLVMTargetMachine
&>(*Target);
> +  TargetPassConfig &TPC = *TM.createPassConfig(PM);
> +  PM.add(&TPC);
> +  MachineModuleInfo *MMI = new MachineModuleInfo(&TM);
> +  PM.add(MMI);
> +  TPC.printAndVerify("");
> +
> +  StringRef PassName = "mir-canonicalizer";
> +  const PassRegistry *PR = PassRegistry::getPassRegistry();
> +  const PassInfo *PI = PR->getPassInfo(PassName);
> +
> +  CANON_CHECK(PI, argv[0] + ": " + PI->getPassName() + "
not registered.");
> +  CANON_CHECK(PI->getNormalCtor(),
> +              argv[0] + ": can't create pass: " +
PI->getPassName());
> +
> +  Pass *P = PI->getNormalCtor()();
> +  PM.add(P);
> +  TPC.printAndVerify(std::string("After ") +
std::string(P->getPassName()));
> +
> +  PM.add(createPrintMIRPass(*OS));
> +
> +  // Before executing passes, print the final values of the LLVM options.
> +  cl::PrintOptionValues();
> +
> +  PM.run(*M);
> +
> +  CANON_CHECK(!(*static_cast<bool
*>(Context.getDiagnosticContext())),
> +              argv[0] + " Error: exiting...");
> +
> +  Out->keep();
> +  return 0;
> +}
> +
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Gerolf Hoflehner via llvm-dev

2017-Aug-30 07:29 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

It sounds quite useful. I wonder though why did you decide in favor of a
stand-alone tool vs an LLVM pass that can be invoked under an option?  It seems
to me using the llvm infrastructure should provide the means for the
transformations the tool is performing. Maybe there is the sense that it is
faster to implement (assuming this is what you mean by lower complexity) an
off-line tool but there are benefits to leveraging llvm also.

I’m also curious about ways to link back difference sightings to actions in the
compiler. Does this come down to follow ups “by hand”? Or do you plan to sync
this with opt-viewer remarks? Or something else like “I (hopefully) know what to
do when I see a ‘canonical’ diff"? Perhaps you can share some thoughts
about it.

Cheers
Gerolf
> On Aug 29, 2017, at 11:17 AM, Justin Bogner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Puyan Lotfi via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>> 
>> Problem Statement and Context:
>> 
>> Diff-tools are regularly used for comparing IR and assembly files. This
often
>> involves reasoning through differences that are semantically equivalent
and it
>> can be very time consuming for a person to do said reasoning.
>> 
>> Specifically in the context of GlobalISel development there are
problems of
>> correctness verification. There is a need to compare two programs,
compiled
>> from identical IR by two different instruction selectors in a way where
the
>> true differences stand out.
>> 
>> Proposal:
>> 
>> We propose a new tool that we have tentatively named
'mir-canon' that performs
>> canonical transformations on MIR. The goal is for MIR pre-processed
with
>> mir-canon to show fewer differences than if it were not pre-processed.
> 
> This sounds great. One place I'm hoping to use this for is in ISel
> fuzzing - we can currently catch crashes and machine verifier errors,
> but comparing SDAG to GlobalISel is a space that I'd really like to
> explore.
> 
>> At the time of this writing we have a prototype canonicalization tool.
We’ve
>> come up with some techniques that show promise and would like to open
>> discussion with the community to get feedback and suggestions on
refining said
>> techniques. Currently we think of this as an open ended project.
>> 
>> Techniques:
>> 
>> Our prototype does the following for each basic block in a Reverse Post
>> Ordering:
>> 
>>  * Canonical instruction ordering is done by moving a given instruction
as
>>    close to the nearest use of its def as possible.
>> 
>>  * Next, canonical VReg renaming is done by building a collection of
>>    candidate instructions that can be thought of as sinks in the
def-use
>>    graph: they are typically instructions that write to physical
registers or
>>    store to memory. These candidates are used as the root of a breadth
first
>>    walk over the vreg operand def-use graph that determines a canonical
vreg
>>    ordering.
>> 
>>  * Using said canonical vreg ordering we rename monotonically, but
before we
>>    do this we skip several vreg values in order to increase the chance
that
>>    we land on the same vreg number for two different input MIR files.
> 
> This is one place where handling multiple files at once would help,
correct?
> 
>>    We also do this to reduce the chances that a difference in
>>    previously def-use walks will affect the vreg renaming for
>>    subsequent walks. This skipping step could be thought of as a kind
>>    of vreg number reckoning: we skip modulo n vregs so that we are
>>    likely to land on the same vreg for two different files.
>> 
>> This approach is completely agnostic of ISA specific semantics so it
should
>> work for any target.
>> 
>> Current status:
>> 
>> At the moment we have a standalone llvm tool that uses a single pass to
do
>> the above described transformations. We have test inputs that show
promise
>> but we still need a wider set of tests as well as hard metrics.
> 
> This sounds like a great start, but I feel like this design won't quite
> scale to all the ways we want to use this.
> 
> Consider, if this was a pass in the LLVM library rather than only
> accessible in a tool, you could potentially set it up to run at the end
> of an `llc` run. Similarly, if I want to use this in a fuzzer it would
> be nice to be able to do the canonicalization in memory.
> 
>> Our approach processes a single file at a time. The primary benefit to
this
>> approach is lower complexity in initial implementation and exploration
of
>> building such a tool. We are open to other approaches such as an
llvm-diff
>> like (two file at a time) approach, but we have not explored that
avenue
>> fully yet.
>> 
>> We’re eager to hear community feedback and will be ready to share
patches
>> for review soon.
> ...
>> Patch for review.
> 
> Please send the patch to llvm-commits (and refer back to this thread).
> Patch review is too much traffic for llvm-dev.
> 
>> diff --git a/tools/mir-canon/CMakeLists.txt
b/tools/mir-canon/CMakeLists.txt
>> new file mode 100644
>> index 00000000000..eb6722e4bd4
>> --- /dev/null
>> +++ b/tools/mir-canon/CMakeLists.txt
>> @@ -0,0 +1,28 @@
>> +set(LLVM_LINK_COMPONENTS
>> +  ${LLVM_TARGETS_TO_BUILD}
>> +  Analysis
>> +  AsmPrinter
>> +  CodeGen
>> +  Core
>> +  IRReader
>> +  MC
>> +  MIRParser
>> +  ScalarOpts
>> +  SelectionDAG
>> +  Support
>> +  Target
>> +  TransformUtils
>> +  Vectorize
>> +  )
>> +
>> +# Support plugins.
>> +set(LLVM_NO_DEAD_STRIP 1)
>> +
>> +add_llvm_tool(mir-canon
>> +  mir-canon.cpp
>> +  MIRCanonicalizerPass.cpp
>> +
>> +  DEPENDS
>> +  intrinsics_gen
>> +  )
>> +export_executable_symbols(mir-canon)
>> diff --git a/tools/mir-canon/Common.h b/tools/mir-canon/Common.h
>> new file mode 100644
>> index 00000000000..3e6be7ef643
>> --- /dev/null
>> +++ b/tools/mir-canon/Common.h
>> @@ -0,0 +1,21 @@
>> +#include <cassert>
>> +
>> +namespace llvm {
>> +/// The purpose of this pass is to establish a canonical operand
naming so
>> +/// that code compiled with slightly different IR passes can be diffed
more
>> +/// effectively than otherwise. This is done by renaming vregs in a
given
>> +/// LiveRange in a canonical way.
>> +extern char &MIRCanonicalizerID;
>> +
>> +void initializeMIRCanonicalizerPass(PassRegistry &);
>> +
>> +} // namespace llvm
>> +
>> +#define CANON_CHECK(CHK, MSG) \
>> +  do { \
>> +    if ((CHK)) break; \
>> +    errs() << (std::string("") + MSG) <<
'\n'; \
>> +    assert(false && "CANON_CHECK FAILURE."); \
>> +    exit(EXIT_FAILURE); \
>> +  } while (false)
>> +
>> diff --git a/tools/mir-canon/LLVMBuild.txt
b/tools/mir-canon/LLVMBuild.txt
>> new file mode 100644
>> index 00000000000..5d3e6bb5a91
>> --- /dev/null
>> +++ b/tools/mir-canon/LLVMBuild.txt
>> @@ -0,0 +1,22 @@
>> +;===- ./tools/mir-canon/LLVMBuild.txt --------------------------*-
Conf -*--===;
>> +;
>> +;                     The LLVM Compiler Infrastructure
>> +;
>> +; This file is distributed under the University of Illinois Open
Source
>> +; License. See LICENSE.TXT for details.
>> +;
>>
+;===------------------------------------------------------------------------===;
>> +;
>> +; This is an LLVMBuild description file for the components in this
subdirectory.
>> +;
>> +; For more information on the LLVMBuild system, please see:
>> +;
>> +;   http://llvm.org/docs/LLVMBuild.html
>> +;
>>
+;===------------------------------------------------------------------------===;
>> +
>> +[component_0]
>> +type = Tool
>> +name = mir-canon
>> +parent = Tools
>> +required_libraries = AsmParser BitReader IRReader MIRParser
TransformUtils Scalar Vectorize all-targets
>> diff --git a/tools/mir-canon/MIRCanonicalizerPass.cpp
b/tools/mir-canon/MIRCanonicalizerPass.cpp
>> new file mode 100644
>> index 00000000000..a4a0a7a0e5f
>> --- /dev/null
>> +++ b/tools/mir-canon/MIRCanonicalizerPass.cpp
>> @@ -0,0 +1,588 @@
>> +//===-------------- MIRCanonicalizer.cpp - MIR Canonicalizer
--------------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open
Source
>> +// License. See LICENSE.TXT for details.
>> +//
>>
+//===----------------------------------------------------------------------===//
>> +//
>> +// Reorders instructions canonically.
>> +// Renames virtual register operands canonically.
>> +// Strips certain MIR artifacts (optionally).
>> +//
>>
+//===----------------------------------------------------------------------===//
>> +
>> +#include "llvm/CodeGen/Passes.h"
>> +#include "llvm/CodeGen/MachineInstrBuilder.h"
>> +#include "llvm/CodeGen/MachineFunctionPass.h"
>> +#include "llvm/CodeGen/MachineRegisterInfo.h"
>> +
>> +#include "llvm/Support/raw_ostream.h"
>> +#include "llvm/ADT/PostOrderIterator.h"
>> +#include "llvm/Target/TargetInstrInfo.h"
>> +
>> +#include <tuple>
>> +#include <queue>
>> +
>> +using namespace llvm;
>> +
>> +#include "Common.h"
>> +
>> +#define DEBUG_TYPE "mir-canonicalizer"
>> +
>> +static cl::opt<unsigned>
>> +CanonicalizeFunctionNumber("canon-nth-function", cl::Hidden,
cl::init(~0u),
>> +                           cl::value_desc("N"),
>> +                           cl::desc("Function number to
canonicalize."));
>> +
>> +static cl::opt<unsigned>
>> +CanonicalizeBasicBlockNumber("canon-nth-basicblock",
cl::Hidden, cl::init(~0u),
>> +                             cl::value_desc("N"),
>> +                             cl::desc("BasicBlock number to
canonicalize."));
>> +
>> +namespace {
>> +
>> +class MIRCanonicalizer : public MachineFunctionPass {
>> +public:
>> +  static char ID;
>> +  MIRCanonicalizer() : MachineFunctionPass(ID) {}
>> +
>> +  StringRef getPassName() const override {
>> +    return "Rename register operands in a canonical
ordering.";
>> +  }
>> +
>> +  void getAnalysisUsage(AnalysisUsage &AU) const override {
>> +    AU.setPreservesCFG();
>> +    MachineFunctionPass::getAnalysisUsage(AU);
>> +  }
>> +
>> +  bool runOnMachineFunction(MachineFunction &MF) override;
>> +};
>> +
>> +} // end anonymous namespace
>> +
>> +enum RSType { RSE_Reg = 0, RSE_FrameIndex, RSE_NewCandidate };
>> +typedef std::tuple<RSType, unsigned> typedRegStackEntry;
>> +
>> +char MIRCanonicalizer::ID;
>> +
>> +char &llvm::MIRCanonicalizerID = MIRCanonicalizer::ID;
>> +
>> +INITIALIZE_PASS_BEGIN(MIRCanonicalizer, "mir-canonicalizer",
>> +                      "Rename Register Operands
Canonically", false, false);
>> +
>> +INITIALIZE_PASS_END(MIRCanonicalizer, "mir-canonicalizer",
>> +                    "Rename Register Operands Canonically",
false, false);
>> +
>> +static std::vector<MachineBasicBlock *>
GetRPOList(MachineFunction &MF) {
>> +  MachineBasicBlock *Entry = &*MF.begin();
>> +  std::vector<MachineBasicBlock*> RPOList;
>> +  ReversePostOrderTraversal<MachineBasicBlock*> RPOT(Entry);
>> +  for
(ReversePostOrderTraversal<MachineBasicBlock*>::rpo_iterator
>> +       MBBI = RPOT.begin(), MBBE = RPOT.end(); MBBI != MBBE; ++MBBI) {
>> +    MachineBasicBlock *MBB = *MBBI;
>> +    RPOList.push_back(MBB);
>> +  }
>> +
>> +  return RPOList;
>> +}
>> +
>> +// Set a dummy vreg. We use this vregs register class to generate
throw-away
>> +// vregs that are used to skip vreg numbers so that vreg numbers line
up.
>> +static unsigned GetDummyVReg(const MachineFunction &MF) {
>> +  for (auto &MBB : MF) {
>> +    for (auto II = MBB.begin(), IE = MBB.end(); II != IE; ++II) {
>> +      const MachineInstr *MI = &*II;
>> +      for (unsigned i = 0; i < MI->getNumOperands(); i++) {
>> +        const MachineOperand &MO = MI->getOperand(i);
>> +        if (!MO.isReg() ||
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>> +          continue;
>> +        return MO.getReg();
>> +      }
>> +    }
>> +  }
>> +
>> +  return ~0U;
>> +}
>> +
>> +static bool rescheduleCanoncally(MachineBasicBlock *MBB) {
>> +
>> +  bool Changed = false;
>> +
>> +  // Calculates the distance of MI from the begining of its parent BB.
>> +  auto getInstrIdx = [](const MachineInstr &MI) {
>> +    unsigned i = 0;
>> +    for (auto &I : *MI.getParent()) {
>> +      if (&I == &MI)
>> +        return i;
>> +      i++;
>> +    }
>> +    return ~0U;
>> +  };
>> +
>> +  // Pre-Populate vector of instructions to reschedule so that we
don't
>> +  // clobber the iterator.
>> +  std::vector<MachineInstr *> instructions;
>> +  for (auto II = MBB->begin(); II != MBB->end(); ++II) {
>> +    instructions.push_back(&*II);
>> +  }
>> +
>> +  for (auto II : instructions) {
>> +    if (II->getNumOperands() == 0)
>> +      continue;
>> +
>> +    MachineOperand &MO = II->getOperand(0);
>> +    if (!MO.isReg() ||
!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>> +      continue;
>> +
>> +    DEBUG(dbgs() << "Operand " << 0 <<
" of "; II->dump(); MO.dump(););
>> +
>> +    MachineInstr *def = II;
>> +    unsigned distance = ~0U;
>> +    MachineInstr *useToBringDefCloserTo = nullptr;
>> +    MachineRegisterInfo *MRI =
&MBB->getParent()->getRegInfo();
>> +    for (auto UI = MRI->use_begin(MO.getReg()), UE =
MRI->use_end();
>> +         UI != UE; ++UI) {
>> +      MachineInstr *useInst = UI->getParent();
>> +
>> +      const unsigned defLoc = getInstrIdx(*def);
>> +      const unsigned useLoc = getInstrIdx(*useInst);
>> +      const unsigned delta = (useLoc - defLoc);
>> +
>> +      if (useInst->getParent() != def->getParent())
>> +        continue;
>> +      if (defLoc >= useLoc)
>> +        continue;
>> +
>> +      if (delta < distance) {
>> +        distance = delta;
>> +        useToBringDefCloserTo = useInst;
>> +      }
>> +    }
>> +
>> +    MachineBasicBlock::iterator defI = MBB->instr_end();
>> +    MachineBasicBlock::iterator useI = MBB->instr_end();
>> +
>> +    for (auto BBI = MBB->instr_begin(); BBI != MBB->instr_end();
++BBI) {
>> +
>> +      if (defI != MBB->instr_end() && useI !=
MBB->instr_end())
>> +        break;
>> +
>> +      if ((&*BBI != def) && (&*BBI !=
useToBringDefCloserTo))
>> +        continue;
>> +
>> +      if (&*BBI == def) {
>> +        defI = BBI;
>> +        continue;
>> +      }
>> +
>> +      if (&*BBI == useToBringDefCloserTo) {
>> +        useI = BBI;
>> +        continue;
>> +      }
>> +    }
>> +
>> +    if (defI == MBB->instr_end() || useI == MBB->instr_end())
>> +      continue;
>> +
>> +    DEBUG(dbgs() << "Splicing "; defI->dump();
dbgs() << " right before: ";
>> +          useI->dump(););
>> +
>> +    Changed = true;
>> +    MBB->splice(useI, MBB, defI);
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>> +static std::vector<MachineInstr *>
populateCandidates(MachineBasicBlock *MBB) {
>> +  std::vector<MachineInstr *> candidates;
>> +  MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();
>> +
>> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II)
{
>> +    MachineInstr *MI = &*II;
>> +
>> +    bool doesMISideEffect = false;
>> +
>> +    if (MI->getOperand(0).isReg()) {
>> +      const unsigned dst = MI->getOperand(0).getReg();
>> +      doesMISideEffect |= !TargetRegisterInfo::isVirtualRegister(dst);
>> +
>> +      for (auto UI = MRI.use_begin(dst); UI != MRI.use_end(); ++UI) {
>> +        if (doesMISideEffect) break;
>> +        doesMISideEffect |= (UI->getParent()->getParent() !=
MI->getParent());
>> +      }
>> +    }
>> +
>> +    if (!MI->mayStore() && !MI->isBranch() &&
!doesMISideEffect)
>> +      continue;
>> +
>> +    DEBUG(dbgs() << "Found Candidate:  ";
MI->dump(););
>> +    candidates.push_back(MI);
>> +  }
>> +
>> +  return candidates;
>> +}
>> +
>> +void doCandidateWalk(std::vector<typedRegStackEntry> &VRegs,
>> +                     std::queue <typedRegStackEntry>
&reg_queue,
>> +                     std::vector<MachineInstr *>
&visitedMIs,
>> +                     const MachineBasicBlock *MBB) {
>> +
>> +  const MachineFunction &MF = *MBB->getParent();
>> +  const MachineRegisterInfo &MRI = MF.getRegInfo();
>> +
>> +  while (reg_queue.size()) {
>> +    auto regType = std::get<0>(reg_queue.front());
>> +    auto reg = std::get<1>(reg_queue.front());
>> +    reg_queue.pop();
>> +
>> +    if (regType == RSE_FrameIndex) {
>> +      DEBUG(dbgs() << "Popping frame index.\n";);
>> +      VRegs.push_back(std::make_tuple(RSE_FrameIndex, ~0U));
>> +      continue;
>> +    }
>> +
>> +    assert(regType == RSE_Reg && "Expected vreg or
physreg.");
>> +
>> +    if (TargetRegisterInfo::isVirtualRegister(reg)) {
>> +      DEBUG(dbgs() << "Popping vreg ";
>> +          MRI.def_begin(reg)->dump();
>> +          dbgs() << "\n";);
>> +
>> +      bool hasVisitedVReg = false;
>> +      for (auto vreg : VRegs) {
>> +        if (std::get<1>(vreg) == reg) {
>> +          hasVisitedVReg = true;
>> +          break;
>> +        }
>> +      }
>> +
>> +      if (!hasVisitedVReg)
>> +        VRegs.push_back(std::make_tuple(RSE_Reg, reg));
>> +    } else {
>> +      DEBUG(dbgs() << "Popping physreg.\n";);
>> +      VRegs.push_back(std::make_tuple(RSE_Reg, reg));
>> +      continue;
>> +    }
>> +
>> +    for (auto RI = MRI.def_begin(reg), RE = MRI.def_end(); RI != RE;
++RI) {
>> +      MachineInstr *def = RI->getParent();
>> +
>> +      if (def->getParent() != MBB)
>> +        continue;
>> +
>> +      bool hasVisited = false;
>> +      for (auto VMI : visitedMIs) {
>> +        if (def == VMI) {
>> +          hasVisited = true;
>> +          break;
>> +        }
>> +      }
>> +
>> +      if (hasVisited)
>> +        break;
>> +
>> +      DEBUG(dbgs() << "\n========================\n";
>> +          dbgs() << "Visited MI: "; def->dump();
>> +          dbgs() << "BB Name: " <<
def->getParent()->getName() << "\n";
>> +          dbgs() << "\n========================\n";);
>> +      visitedMIs.push_back(def);
>> +      for (unsigned I = 1, E = def->getNumOperands(); I != E; ++I)
{
>> +
>> +        MachineOperand &MO = def->getOperand(I);
>> +        if (MO.isFI()) {
>> +          DEBUG(dbgs() << "Pushing frame index.\n";);
>> +          reg_queue.push(std::make_tuple(RSE_FrameIndex, ~0U));
>> +        }
>> +
>> +        if (!MO.isReg())
>> +          continue;
>> +        reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
>> +      }
>> +    }
>> +  }
>> +}
>> +
>> +static void SkipVRegs(unsigned &VRegGapIndex, MachineRegisterInfo
&MRI,
>> +                      const TargetRegisterClass *RC) {
>> +  const unsigned VR_GAP = (++VRegGapIndex * 1000);
>> +
>> +  DEBUG(dbgs() << "Adjusting per-BB VR_GAP for BB"
<< VRegGapIndex << " to "
>> +               << VR_GAP << "\n";);
>> +
>> +  unsigned I = MRI.createVirtualRegister(RC);
>> +  const unsigned E = (((I + VR_GAP) / VR_GAP) + 1) * VR_GAP;
>> +  while (I != E) {
>> +    I = MRI.createVirtualRegister(RC);
>> +  }
>> +
>> +  CANON_CHECK(I != ~0U && I != 0, "Invalid VReg");
>> +}
>> +
>> +static void GetVRegRenameMap(std::map<unsigned, unsigned>
&vregRenameMap,
>> +                             const
std::vector<typedRegStackEntry> &VRegs,
>> +                             const std::vector<unsigned>
&renamedInOtherBB,
>> +                             MachineRegisterInfo &MRI,
>> +                             const TargetRegisterClass *RC) {
>> +
>> +  unsigned lastRenameReg = MRI.createVirtualRegister(RC);
>> +
>> +  bool firstCandidate = true;
>> +  for (auto vreg : VRegs) {
>> +
>> +    auto regType = std::get<0>(vreg);
>> +    auto reg = std::get<1>(vreg);
>> +
>> +    if (regType == RSE_FrameIndex) {
>> +      lastRenameReg = MRI.createVirtualRegister(RC);
>> +      DEBUG(dbgs() << "Skipping rename for FI "
<< lastRenameReg << "\n";);
>> +      continue;
>> +    } else if (regType == RSE_NewCandidate) {
>> +      if (!firstCandidate) {
>> +        while (lastRenameReg % 10) {
>> +          lastRenameReg = MRI.createVirtualRegister(RC);
>> +
>> +          DEBUG(dbgs() << "Skipping rename for new
candidate " << lastRenameReg
>> +                       << "\n";);
>> +        }
>> +      }
>> +      firstCandidate = false;
>> +      continue;
>> +    } else if (!TargetRegisterInfo::isVirtualRegister(reg)) {
>> +      lastRenameReg = MRI.createVirtualRegister(RC);
>> +      DEBUG(dbgs() << "Skipping rename for Phys Reg "
<< lastRenameReg << "\n";);
>> +      continue;
>> +    }
>> +
>> +    if (std::find(renamedInOtherBB.begin(), renamedInOtherBB.end(),
reg) !>> +        renamedInOtherBB.end()) {
>> +      DEBUG(dbgs() << "Vreg " << reg <<
" already renamed in other BB.\n";);
>> +      continue;
>> +    }
>> +
>> +    auto rename = MRI.createVirtualRegister(MRI.getRegClass(reg));
>> +    lastRenameReg = rename;
>> +
>> +    if (vregRenameMap.find(reg) == vregRenameMap.end()) {
>> +      DEBUG(dbgs() << "Mapping vreg ";);
>> +      if (MRI.reg_begin(reg) != MRI.reg_end()) {
>> +        DEBUG(auto foo = &*MRI.reg_begin(reg); foo->dump(););
>> +      } else {
>> +        DEBUG(dbgs() << reg;);
>> +      }
>> +      DEBUG(dbgs() << " to ";);
>> +      if (MRI.reg_begin(rename) != MRI.reg_end()) {
>> +        DEBUG(auto foo = &*MRI.reg_begin(rename);
foo->dump(););
>> +      } else {
>> +        DEBUG(dbgs() << rename;);
>> +      }
>> +      DEBUG(dbgs() << "\n";);
>> +
>> +      vregRenameMap.insert(std::pair<unsigned, unsigned>(reg,
rename));
>> +    }
>> +  }
>> +}
>> +
>> +static bool doVRegRenaming(std::vector<unsigned>
&renamedInOtherBB,
>> +                           const std::map<unsigned, unsigned>
&vregRenameMap,
>> +                           MachineRegisterInfo &MRI) {
>> +  bool Changed = false;
>> +  for (auto I = vregRenameMap.begin(), E = vregRenameMap.end(); I !=
E; ++I) {
>> +
>> +    auto vreg = I->first;
>> +    auto rename = I->second;
>> +
>> +    renamedInOtherBB.push_back(rename);
>> +
>> +    std::vector<MachineOperand *> renameMOs;
>> +    for (MachineOperand &MO : MRI.reg_operands(vreg)) {
>> +      renameMOs.push_back(&MO);
>> +    }
>> +
>> +    for (auto MO : renameMOs) {
>> +      Changed = true;
>> +      MO->setReg(rename);
>> +
>> +      if (!MO->isDef())
>> +        MO->setIsKill(false);
>> +    }
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>> +static bool doDefKillClear(MachineBasicBlock *MBB) {
>> +  bool Changed = false;
>> +
>> +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II)
{
>> +    MachineInstr *MI = &*II;
>> +
>> +    for (unsigned i = 0; i < MI->getNumOperands(); i++) {
>> +      MachineOperand &MO = MI->getOperand(i);
>> +      if (!MO.isReg())
>> +        continue;
>> +      if (!MO.isDef() && MO.isKill()) {
>> +        Changed = true;
>> +        MO.setIsKill(false);
>> +      }
>> +
>> +      if (MO.isDef() && MO.isDead()) {
>> +        Changed = true;
>> +        MO.setIsDead(false);
>> +      }
>> +    }
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>> +static bool runOnBasicBlock(MachineBasicBlock *MBB,
>> +                            std::vector<StringRef> &bbNames,
>> +                            std::vector<unsigned>
&renamedInOtherBB,
>> +                            unsigned &basicBlockNum, unsigned
&VRegGapIndex) {
>> +
>> +  if (CanonicalizeBasicBlockNumber != ~0U) {
>> +    if (CanonicalizeBasicBlockNumber != basicBlockNum++)
>> +      return false;
>> +    DEBUG(dbgs() << "\n Canonicalizing BasicBlock "
<< MBB->getName() << "\n";);
>> +  }
>> +
>> +  if (std::find(bbNames.begin(), bbNames.end(), MBB->getName())
!>> +      bbNames.end()) {
>> +    DEBUG(dbgs() << "Found potentially duplicate
BasicBlocks: "
>> +                 << MBB->getName() << "\n";);
>> +    return false;
>> +  }
>> +
>> +  DEBUG(
>> +    dbgs() << "\n\n  NEW BASIC BLOCK: " <<
MBB->getName() << "  \n\n";
>> +    dbgs() <<
"\n\n================================================\n\n";
>> +  );
>> +
>> +  bool Changed = false;
>> +  MachineFunction &MF = *MBB->getParent();
>> +  MachineRegisterInfo &MRI = MF.getRegInfo();
>> +
>> +  const TargetRegisterClass *dummyRC = [] (const MachineFunction
&MF) {
>> +    const unsigned dummyVReg = GetDummyVReg(MF);
>> +    CANON_CHECK(dummyVReg != ~0U, "Could not find a dummy
VReg.");
>> +    const MachineRegisterInfo &MRI = MF.getRegInfo();
>> +    return MRI.getRegClass(dummyVReg);
>> +  } (MF);
>> +
>> +  bbNames.push_back(MBB->getName());
>> +  DEBUG(dbgs() << "\n\n NEW BASIC BLOCK: " <<
MBB->getName() << "\n\n";);
>> +
>> +  DEBUG(dbgs() << "MBB Before Scheduling:\n";
MBB->dump(););
>> +  Changed |= rescheduleCanoncally(MBB);
>> +  DEBUG(dbgs() << "MBB After Scheduling:\n";
MBB->dump(););
>> +
>> +  std::vector<MachineInstr *> candidates =
populateCandidates(MBB);
>> +  std::vector<MachineInstr *> visitedMIs;
>> +  std::copy(candidates.begin(), candidates.end(),
>> +            std::back_inserter(visitedMIs));
>> +
>> +  std::vector<typedRegStackEntry> VRegs;
>> +  for (auto candidate : candidates) {
>> +    VRegs.push_back(std::make_tuple(RSE_NewCandidate, ~0U));
>> +
>> +    std::queue<typedRegStackEntry> reg_queue;
>> +
>> +    // Here we walk the vreg operands of a non-root node along our
walk.
>> +    // The root nodes are the original candidates (stores normally).
>> +    // These are normally not the root nodes (except for the case of
copies to
>> +    // physical registers).
>> +    for (unsigned i = 1; i < candidate->getNumOperands(); i++) {
>> +      if (candidate->mayStore() || candidate->isBranch())
>> +        break;
>> +
>> +      MachineOperand &MO = candidate->getOperand(i);
>> +      if (!(MO.isReg() &&
TargetRegisterInfo::isVirtualRegister(MO.getReg())))
>> +        continue;
>> +
>> +      DEBUG(dbgs() << "Enqueue register"; MO.dump();
dbgs() << "\n";);
>> +      reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
>> +    }
>> +
>> +    // Here we walk the root candidates. We start from the 0th operand
because
>> +    // the root is normally a store to a vreg.
>> +    for (unsigned i = 0; i < candidate->getNumOperands(); i++) {
>> +
>> +      if (!candidate->mayStore() &&
!candidate->isBranch())
>> +        break;
>> +
>> +      MachineOperand &MO = candidate->getOperand(i);
>> +
>> +      // TODO: Do we want to only add vregs here?
>> +      if (!MO.isReg() && !MO.isFI())
>> +        continue;
>> +
>> +      DEBUG(dbgs() << "Enqueue Reg/FI"; MO.dump();
dbgs() << "\n";);
>> +
>> +      reg_queue.push(MO.isReg() ? std::make_tuple(RSE_Reg,
MO.getReg()) :
>> +                                  std::make_tuple(RSE_FrameIndex, 
~0U));
>> +    }
>> +
>> +    // Now that we have pushed the operands of the candidate here, we
do the
>> +    // full breadth first walk.
>> +    doCandidateWalk(VRegs, reg_queue, visitedMIs, MBB);
>> +  }
>> +
>> +  // If we have populated no vregs to rename then bail.
>> +  // The rest of this function does the vreg remaping.
>> +  if (VRegs.size() == 0)
>> +    return Changed;
>> +
>> +  // Skip some vregs, so we can recon where we'll land next.
>> +  SkipVRegs(VRegGapIndex, MRI, dummyRC);
>> +
>> +  std::map<unsigned, unsigned> vregRenameMap;
>> +  GetVRegRenameMap(vregRenameMap, VRegs, renamedInOtherBB, MRI,
dummyRC);
>> +
>> +  Changed |= doVRegRenaming(renamedInOtherBB, vregRenameMap, MRI);
>> +  Changed |= doDefKillClear(MBB);
>> +
>> +  DEBUG(dbgs() << "Updated MachineBasicBlock:\n";
MBB->dump(); dbgs() << "\n";);
>> +  DEBUG(dbgs() <<
"\n\n================================================\n\n");
>> +  return Changed;
>> +}
>> +
>> +bool MIRCanonicalizer::runOnMachineFunction(MachineFunction &MF) {
>> +
>> +  static unsigned functionNum = 0;
>> +  if (CanonicalizeFunctionNumber != ~0U) {
>> +    if (CanonicalizeFunctionNumber != functionNum++)
>> +      return false;
>> +    DEBUG(dbgs() << "\n Canonicalizing Function "
<< MF.getName() << "\n";);
>> +  }
>> +
>> +  // we need a valid vreg to create a vreg type for skipping all those
>> +  // stray vreg numbers so reach alignment/canonical vreg values.
>> +  std::vector<MachineBasicBlock*> RPOList = GetRPOList(MF);
>> +
>> +  DEBUG(
>> +    dbgs() << "\n\n  NEW MACHINE FUNCTION: " <<
MF.getName() << "  \n\n";
>> +    dbgs() <<
"\n\n================================================\n\n";
>> +    dbgs() << "Total Basic Blocks: " <<
RPOList.size() << "\n";
>> +    for (auto MBB : RPOList) {
>> +      dbgs() << MBB->getName() << "\n";
>> +    }
>> +    dbgs() <<
"\n\n================================================\n\n";
>> +  );
>> +
>> +  std::vector<StringRef> bbNames;
>> +  std::vector<unsigned> renamedInOtherBB;
>> +
>> +  unsigned gapIdx = 0;
>> +  unsigned bbNum = 0;
>> +
>> +  bool Changed = false;
>> +
>> +  for (auto MBB : RPOList)
>> +    Changed |= runOnBasicBlock(MBB, bbNames, renamedInOtherBB, bbNum,
gapIdx);
>> +
>> +  return Changed;
>> +}
>> +
>> diff --git a/tools/mir-canon/mir-canon.cpp
b/tools/mir-canon/mir-canon.cpp
>> new file mode 100644
>> index 00000000000..bfe53c01906
>> --- /dev/null
>> +++ b/tools/mir-canon/mir-canon.cpp
>> @@ -0,0 +1,175 @@
>> +//===------ mir-canon.cpp - Implement the LLVM Native Code Generator
------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open
Source
>> +// License. See LICENSE.TXT for details.
>> +//
>>
+//===----------------------------------------------------------------------===//
>> +//
>> +// This is the mir-canon driver. It invokes the MIR canonicalization
pass.
>> +//
>>
+//===----------------------------------------------------------------------===//
>> +
>> +#include "llvm/CodeGen/CommandFlags.h"
>> +#include "llvm/CodeGen/TargetPassConfig.h"
>> +#include "llvm/CodeGen/MachineModuleInfo.h"
>> +#include "llvm/CodeGen/MachineFunctionPass.h"
>> +#include "llvm/CodeGen/MIRParser/MIRParser.h"
>> +
>> +#include "llvm/IR/LLVMContext.h"
>> +#include "llvm/IR/DiagnosticInfo.h"
>> +#include "llvm/IR/IRPrintingPasses.h"
>> +#include "llvm/IR/DiagnosticPrinter.h"
>> +#include "llvm/IR/LegacyPassManager.h"
>> +
>> +#include "llvm/Support/Signals.h"
>> +#include "llvm/Support/FileSystem.h"
>> +#include "llvm/Support/TargetSelect.h"
>> +#include "llvm/Support/TargetRegistry.h"
>> +#include "llvm/Support/ToolOutputFile.h"
>> +#include "llvm/Support/FormattedStream.h"
>> +#include "llvm/Support/PrettyStackTrace.h"
>> +
>> +#include "llvm/Target/TargetMachine.h"
>> +#include "llvm/Target/TargetSubtargetInfo.h"
>> +
>> +#include "Common.h"
>> +
>> +using namespace llvm;
>> +
>> +static cl::opt<std::string>
>> +    InputFilename(cl::Positional,
cl::desc("<input.mir>"), cl::init("-"));
>> +
>> +static cl::opt<std::string> OutputFilename("o",
cl::desc("Output filename"),
>> +                                          
cl::value_desc("canon.mir"));
>> +
>> +static cl::opt<std::string> TargetTriple("mtriple",
>> +                                         cl::desc("Override
target triple."));
>> +
>> +int main(int argc, char **argv) {
>> +  sys::PrintStackTraceOnErrorSignal(argv[0]);
>> +  PrettyStackTraceProgram X(argc, argv);
>> +
>> +  EnableDebugBuffering = true;
>> +
>> +  LLVMContext Context;
>> +  llvm_shutdown_obj Y;
>> +
>> +  // Initialize targets first, so that --version shows registered
targets.
>> +  InitializeAllTargets();
>> +  InitializeAllTargetMCs();
>> +  InitializeAllAsmPrinters();
>> +  InitializeAllAsmParsers();
>> +
>> +  PassRegistry *Registry = PassRegistry::getPassRegistry();
>> +  initializeCore(*Registry);
>> +  initializeCodeGen(*Registry);
>> +
>> +  initializeMIRCanonicalizerPass(*Registry);
>> +
>> +  // Register the target printer for --version.
>> + 
cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);
>> +
>> +  cl::ParseCommandLineOptions(argc, argv, "llvm system
compiler\n");
>> +
>> + 
CANON_CHECK(StringRef(InputFilename).endswith_lower(".mir"),
>> +              "Invalid Filename: " + InputFilename + ".
" +
>> +              "Expected a .mir file extension.");
>> +
>> +  // Set a diagnostic handler that doesn't exit on the first error
>> +  bool HasError = false;
>> +  Context.setDiagnosticHandler(
>> +    [](const DiagnosticInfo &DI, void *Context) {
>> +      if (DI.getSeverity() == DS_Error) {
>> +        bool *HasError = static_cast<bool *>(Context);
>> +        *HasError = true;
>> +      }
>> +
>> +      auto *Remark =
dyn_cast<DiagnosticInfoOptimizationBase>(&DI);
>> +      if (Remark && !Remark->isEnabled())
>> +        return;
>> +
>> +      DiagnosticPrinterRawOStream DP(errs());
>> +      errs() <<
LLVMContext::getDiagnosticMessagePrefix(DI.getSeverity());
>> +      errs() << ": ";
>> +      DI.print(DP);
>> +      errs() << "\n";
>> +    },
>> +    &HasError);
>> +
>> +  SMDiagnostic Err;
>> +  auto MIR = createMIRParserFromFile(InputFilename, Err, Context);
>> +  auto M = MIR ? MIR->parseIRModule() : nullptr;
>> +
>> +  CANON_CHECK(M, argv[0] + "Invalid Module or MIR.");
>> +
>> +  if (!TargetTriple.empty())
>> +    M->setTargetTriple(Triple::normalize(TargetTriple));
>> +
>> +  Triple TheTriple = Triple(M->getTargetTriple());
>> +  if (TheTriple.getTriple().empty())
>> +    TheTriple.setTriple(sys::getDefaultTargetTriple());
>> +
>> +  // Get the target specific parser.
>> +  std::string Error;
>> +  const Target *TheTarget >> +     
TargetRegistry::lookupTarget(MArch, TheTriple, Error);
>> +
>> +  CANON_CHECK(TheTarget, argv[0] + ": " + Error);
>> +
>> +  std::string CPUStr = getCPUStr();
>> +  std::string FeaturesStr = getFeaturesStr();
>> +
>> +  std::unique_ptr<TargetMachine>
Target(TheTarget->createTargetMachine(
>> +      TheTriple.getTriple(), CPUStr, FeaturesStr,
>> +      InitTargetOptionsFromCodeGenFlags(), getRelocModel(),
getCodeModel(),
>> +      CodeGenOpt::Default));
>> +
>> +  CANON_CHECK(Target, "Could not allocate target machine!");
>> +
>> +  std::error_code EC;
>> +  auto Out = llvm::make_unique<tool_output_file>(OutputFilename,
EC,
>> +                                                 sys::fs::F_Text);
>> +  CANON_CHECK(!EC, EC.message());
>> +  raw_pwrite_stream *OS = &Out->os();
>> +
>> +  legacy::PassManager PM;
>> +  TargetLibraryInfoImpl TLII(Triple(M->getTargetTriple()));
>> +  PM.add(new TargetLibraryInfoWrapperPass(TLII));
>> +  M->setDataLayout(Target->createDataLayout());
>> +  setFunctionAttributes(CPUStr, FeaturesStr, *M);
>> +
>> +  LLVMTargetMachine &TM = static_cast<LLVMTargetMachine
&>(*Target);
>> +  TargetPassConfig &TPC = *TM.createPassConfig(PM);
>> +  PM.add(&TPC);
>> +  MachineModuleInfo *MMI = new MachineModuleInfo(&TM);
>> +  PM.add(MMI);
>> +  TPC.printAndVerify("");
>> +
>> +  StringRef PassName = "mir-canonicalizer";
>> +  const PassRegistry *PR = PassRegistry::getPassRegistry();
>> +  const PassInfo *PI = PR->getPassInfo(PassName);
>> +
>> +  CANON_CHECK(PI, argv[0] + ": " + PI->getPassName() +
" not registered.");
>> +  CANON_CHECK(PI->getNormalCtor(),
>> +              argv[0] + ": can't create pass: " +
PI->getPassName());
>> +
>> +  Pass *P = PI->getNormalCtor()();
>> +  PM.add(P);
>> +  TPC.printAndVerify(std::string("After ") +
std::string(P->getPassName()));
>> +
>> +  PM.add(createPrintMIRPass(*OS));
>> +
>> +  // Before executing passes, print the final values of the LLVM
options.
>> +  cl::PrintOptionValues();
>> +
>> +  PM.run(*M);
>> +
>> +  CANON_CHECK(!(*static_cast<bool
*>(Context.getDiagnosticContext())),
>> +              argv[0] + " Error: exiting...");
>> +
>> +  Out->keep();
>> +  return 0;
>> +}
>> +
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Puyan Lotfi via llvm-dev

2017-Sep-06 15:08 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

On Tue, Aug 29, 2017 at 2:17 PM, Justin Bogner <mail at justinbogner.com>
wrote:
> Puyan Lotfi via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
> >
> > Problem Statement and Context:
> >
> > Diff-tools are regularly used for comparing IR and assembly files.
This
> often
> > involves reasoning through differences that are semantically
equivalent
> and it
> > can be very time consuming for a person to do said reasoning.
> >
> > Specifically in the context of GlobalISel development there are
problems
> of
> > correctness verification. There is a need to compare two programs,
> compiled
> > from identical IR by two different instruction selectors in a way
where
> the
> > true differences stand out.
> >
> > Proposal:
> >
> > We propose a new tool that we have tentatively named
'mir-canon' that
> performs
> > canonical transformations on MIR. The goal is for MIR pre-processed
with
> > mir-canon to show fewer differences than if it were not pre-processed.
>
> This sounds great. One place I'm hoping to use this for is in ISel
> fuzzing - we can currently catch crashes and machine verifier errors,
> but comparing SDAG to GlobalISel is a space that I'd really like to
> explore.
>
Thats exactly what I'm trying to do. I'm still trying to think of good
ways
to narrow down the set of functions and basic blocks that get
canonicalized. I have some opt flags to control that but I think the
control could be better.

>
> > At the time of this writing we have a prototype canonicalization tool.
> We’ve
> > come up with some techniques that show promise and would like to open
> > discussion with the community to get feedback and suggestions on
> refining said
> > techniques. Currently we think of this as an open ended project.
> >
> > Techniques:
> >
> > Our prototype does the following for each basic block in a Reverse
Post
> > Ordering:
> >
> >   * Canonical instruction ordering is done by moving a given
instruction
> as
> >     close to the nearest use of its def as possible.
> >
> >   * Next, canonical VReg renaming is done by building a collection of
> >     candidate instructions that can be thought of as sinks in the
def-use
> >     graph: they are typically instructions that write to physical
> registers or
> >     store to memory. These candidates are used as the root of a
breadth
> first
> >     walk over the vreg operand def-use graph that determines a
canonical
> vreg
> >     ordering.
> >
> >   * Using said canonical vreg ordering we rename monotonically, but
> before we
> >     do this we skip several vreg values in order to increase the
chance
> that
> >     we land on the same vreg number for two different input MIR files.
>
> This is one place where handling multiple files at once would help,
> correct?
>
>I think so. I think if I were processing two files at once I could
increment the vreg count by a more exact number between basic blocks
processed and between def-use walks.

> >     We also do this to reduce the chances that a difference in
> >     previously def-use walks will affect the vreg renaming for
> >     subsequent walks. This skipping step could be thought of as a kind
> >     of vreg number reckoning: we skip modulo n vregs so that we are
> >     likely to land on the same vreg for two different files.
> >
> > This approach is completely agnostic of ISA specific semantics so it
> should
> > work for any target.
> >
> > Current status:
> >
> > At the moment we have a standalone llvm tool that uses a single pass
to
> do
> > the above described transformations. We have test inputs that show
> promise
> > but we still need a wider set of tests as well as hard metrics.
>
> This sounds like a great start, but I feel like this design won't quite
> scale to all the ways we want to use this.
>
> Consider, if this was a pass in the LLVM library rather than only
> accessible in a tool, you could potentially set it up to run at the end
> of an `llc` run. Similarly, if I want to use this in a fuzzer it would
> be nice to be able to do the canonicalization in memory.
>
> I used to have it as an LLVM pass but a while ago I thought it could be
more prudent to have contained in a separate tool as it wasn't actually for
optimization or codegen. I can easily reintegrate it as an LLVM pass. I'll
send out another patch.

> > Our approach processes a single file at a time. The primary benefit to
> this
> > approach is lower complexity in initial implementation and exploration
of
> > building such a tool. We are open to other approaches such as an
> llvm-diff
> > like (two file at a time) approach, but we have not explored that
avenue
> > fully yet.
> >
> > We’re eager to hear community feedback and will be ready to share
patches
> > for review soon.
>  ...
> > Patch for review.
>
> Please send the patch to llvm-commits (and refer back to this thread).
> Patch review is too much traffic for llvm-dev.
>
>Great, thanks!

> > diff --git a/tools/mir-canon/CMakeLists.txt
> b/tools/mir-canon/CMakeLists.txt
> > new file mode 100644
> > index 00000000000..eb6722e4bd4
> > --- /dev/null
> > +++ b/tools/mir-canon/CMakeLists.txt
> > @@ -0,0 +1,28 @@
> > +set(LLVM_LINK_COMPONENTS
> > +  ${LLVM_TARGETS_TO_BUILD}
> > +  Analysis
> > +  AsmPrinter
> > +  CodeGen
> > +  Core
> > +  IRReader
> > +  MC
> > +  MIRParser
> > +  ScalarOpts
> > +  SelectionDAG
> > +  Support
> > +  Target
> > +  TransformUtils
> > +  Vectorize
> > +  )
> > +
> > +# Support plugins.
> > +set(LLVM_NO_DEAD_STRIP 1)
> > +
> > +add_llvm_tool(mir-canon
> > +  mir-canon.cpp
> > +  MIRCanonicalizerPass.cpp
> > +
> > +  DEPENDS
> > +  intrinsics_gen
> > +  )
> > +export_executable_symbols(mir-canon)
> > diff --git a/tools/mir-canon/Common.h b/tools/mir-canon/Common.h
> > new file mode 100644
> > index 00000000000..3e6be7ef643
> > --- /dev/null
> > +++ b/tools/mir-canon/Common.h
> > @@ -0,0 +1,21 @@
> > +#include <cassert>
> > +
> > +namespace llvm {
> > +/// The purpose of this pass is to establish a canonical operand
naming
> so
> > +/// that code compiled with slightly different IR passes can be
diffed
> more
> > +/// effectively than otherwise. This is done by renaming vregs in a
> given
> > +/// LiveRange in a canonical way.
> > +extern char &MIRCanonicalizerID;
> > +
> > +void initializeMIRCanonicalizerPass(PassRegistry &);
> > +
> > +} // namespace llvm
> > +
> > +#define CANON_CHECK(CHK, MSG) \
> > +  do { \
> > +    if ((CHK)) break; \
> > +    errs() << (std::string("") + MSG) <<
'\n'; \
> > +    assert(false && "CANON_CHECK FAILURE."); \
> > +    exit(EXIT_FAILURE); \
> > +  } while (false)
> > +
> > diff --git a/tools/mir-canon/LLVMBuild.txt
b/tools/mir-canon/LLVMBuild.
> txt
> > new file mode 100644
> > index 00000000000..5d3e6bb5a91
> > --- /dev/null
> > +++ b/tools/mir-canon/LLVMBuild.txt
> > @@ -0,0 +1,22 @@
> > +;===- ./tools/mir-canon/LLVMBuild.txt --------------------------*-
> Conf -*--===;
> > +;
> > +;                     The LLVM Compiler Infrastructure
> > +;
> > +; This file is distributed under the University of Illinois Open
Source
> > +; License. See LICENSE.TXT for details.
> > +;
> > +;===-------------------------------------------------------
> -----------------===;
> > +;
> > +; This is an LLVMBuild description file for the components in this
> subdirectory.
> > +;
> > +; For more information on the LLVMBuild system, please see:
> > +;
> > +;   http://llvm.org/docs/LLVMBuild.html
> > +;
> > +;===-------------------------------------------------------
> -----------------===;
> > +
> > +[component_0]
> > +type = Tool
> > +name = mir-canon
> > +parent = Tools
> > +required_libraries = AsmParser BitReader IRReader MIRParser
> TransformUtils Scalar Vectorize all-targets
> > diff --git a/tools/mir-canon/MIRCanonicalizerPass.cpp
b/tools/mir-canon/
> MIRCanonicalizerPass.cpp
> > new file mode 100644
> > index 00000000000..a4a0a7a0e5f
> > --- /dev/null
> > +++ b/tools/mir-canon/MIRCanonicalizerPass.cpp
> > @@ -0,0 +1,588 @@
> > +//===-------------- MIRCanonicalizer.cpp - MIR Canonicalizer
> --------------===//
> > +//
> > +//                     The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open
Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +//
> > +// Reorders instructions canonically.
> > +// Renames virtual register operands canonically.
> > +// Strips certain MIR artifacts (optionally).
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +
> > +#include "llvm/CodeGen/Passes.h"
> > +#include "llvm/CodeGen/MachineInstrBuilder.h"
> > +#include "llvm/CodeGen/MachineFunctionPass.h"
> > +#include "llvm/CodeGen/MachineRegisterInfo.h"
> > +
> > +#include "llvm/Support/raw_ostream.h"
> > +#include "llvm/ADT/PostOrderIterator.h"
> > +#include "llvm/Target/TargetInstrInfo.h"
> > +
> > +#include <tuple>
> > +#include <queue>
> > +
> > +using namespace llvm;
> > +
> > +#include "Common.h"
> > +
> > +#define DEBUG_TYPE "mir-canonicalizer"
> > +
> > +static cl::opt<unsigned>
> > +CanonicalizeFunctionNumber("canon-nth-function",
cl::Hidden,
> cl::init(~0u),
> > +                           cl::value_desc("N"),
> > +                           cl::desc("Function number to
> canonicalize."));
> > +
> > +static cl::opt<unsigned>
> > +CanonicalizeBasicBlockNumber("canon-nth-basicblock",
cl::Hidden,
> cl::init(~0u),
> > +                             cl::value_desc("N"),
> > +                             cl::desc("BasicBlock number to
> canonicalize."));
> > +
> > +namespace {
> > +
> > +class MIRCanonicalizer : public MachineFunctionPass {
> > +public:
> > +  static char ID;
> > +  MIRCanonicalizer() : MachineFunctionPass(ID) {}
> > +
> > +  StringRef getPassName() const override {
> > +    return "Rename register operands in a canonical
ordering.";
> > +  }
> > +
> > +  void getAnalysisUsage(AnalysisUsage &AU) const override {
> > +    AU.setPreservesCFG();
> > +    MachineFunctionPass::getAnalysisUsage(AU);
> > +  }
> > +
> > +  bool runOnMachineFunction(MachineFunction &MF) override;
> > +};
> > +
> > +} // end anonymous namespace
> > +
> > +enum RSType { RSE_Reg = 0, RSE_FrameIndex, RSE_NewCandidate };
> > +typedef std::tuple<RSType, unsigned> typedRegStackEntry;
> > +
> > +char MIRCanonicalizer::ID;
> > +
> > +char &llvm::MIRCanonicalizerID = MIRCanonicalizer::ID;
> > +
> > +INITIALIZE_PASS_BEGIN(MIRCanonicalizer,
"mir-canonicalizer",
> > +                      "Rename Register Operands
Canonically", false,
> false);
> > +
> > +INITIALIZE_PASS_END(MIRCanonicalizer, "mir-canonicalizer",
> > +                    "Rename Register Operands Canonically",
false,
> false);
> > +
> > +static std::vector<MachineBasicBlock *>
GetRPOList(MachineFunction &MF)
> {
> > +  MachineBasicBlock *Entry = &*MF.begin();
> > +  std::vector<MachineBasicBlock*> RPOList;
> > +  ReversePostOrderTraversal<MachineBasicBlock*> RPOT(Entry);
> > +  for
(ReversePostOrderTraversal<MachineBasicBlock*>::rpo_iterator
> > +       MBBI = RPOT.begin(), MBBE = RPOT.end(); MBBI != MBBE; ++MBBI)
{
> > +    MachineBasicBlock *MBB = *MBBI;
> > +    RPOList.push_back(MBB);
> > +  }
> > +
> > +  return RPOList;
> > +}
> > +
> > +// Set a dummy vreg. We use this vregs register class to generate
> throw-away
> > +// vregs that are used to skip vreg numbers so that vreg numbers line
> up.
> > +static unsigned GetDummyVReg(const MachineFunction &MF) {
> > +  for (auto &MBB : MF) {
> > +    for (auto II = MBB.begin(), IE = MBB.end(); II != IE; ++II) {
> > +      const MachineInstr *MI = &*II;
> > +      for (unsigned i = 0; i < MI->getNumOperands(); i++) {
> > +        const MachineOperand &MO = MI->getOperand(i);
> > +        if (!MO.isReg() || !TargetRegisterInfo::
> isVirtualRegister(MO.getReg()))
> > +          continue;
> > +        return MO.getReg();
> > +      }
> > +    }
> > +  }
> > +
> > +  return ~0U;
> > +}
> > +
> > +static bool rescheduleCanoncally(MachineBasicBlock *MBB) {
> > +
> > +  bool Changed = false;
> > +
> > +  // Calculates the distance of MI from the begining of its parent
BB.
> > +  auto getInstrIdx = [](const MachineInstr &MI) {
> > +    unsigned i = 0;
> > +    for (auto &I : *MI.getParent()) {
> > +      if (&I == &MI)
> > +        return i;
> > +      i++;
> > +    }
> > +    return ~0U;
> > +  };
> > +
> > +  // Pre-Populate vector of instructions to reschedule so that we
don't
> > +  // clobber the iterator.
> > +  std::vector<MachineInstr *> instructions;
> > +  for (auto II = MBB->begin(); II != MBB->end(); ++II) {
> > +    instructions.push_back(&*II);
> > +  }
> > +
> > +  for (auto II : instructions) {
> > +    if (II->getNumOperands() == 0)
> > +      continue;
> > +
> > +    MachineOperand &MO = II->getOperand(0);
> > +    if (!MO.isReg() || !TargetRegisterInfo::
> isVirtualRegister(MO.getReg()))
> > +      continue;
> > +
> > +    DEBUG(dbgs() << "Operand " << 0 <<
" of "; II->dump(); MO.dump(););
> > +
> > +    MachineInstr *def = II;
> > +    unsigned distance = ~0U;
> > +    MachineInstr *useToBringDefCloserTo = nullptr;
> > +    MachineRegisterInfo *MRI =
&MBB->getParent()->getRegInfo();
> > +    for (auto UI = MRI->use_begin(MO.getReg()), UE =
MRI->use_end();
> > +         UI != UE; ++UI) {
> > +      MachineInstr *useInst = UI->getParent();
> > +
> > +      const unsigned defLoc = getInstrIdx(*def);
> > +      const unsigned useLoc = getInstrIdx(*useInst);
> > +      const unsigned delta = (useLoc - defLoc);
> > +
> > +      if (useInst->getParent() != def->getParent())
> > +        continue;
> > +      if (defLoc >= useLoc)
> > +        continue;
> > +
> > +      if (delta < distance) {
> > +        distance = delta;
> > +        useToBringDefCloserTo = useInst;
> > +      }
> > +    }
> > +
> > +    MachineBasicBlock::iterator defI = MBB->instr_end();
> > +    MachineBasicBlock::iterator useI = MBB->instr_end();
> > +
> > +    for (auto BBI = MBB->instr_begin(); BBI !=
MBB->instr_end(); ++BBI)
> {
> > +
> > +      if (defI != MBB->instr_end() && useI !=
MBB->instr_end())
> > +        break;
> > +
> > +      if ((&*BBI != def) && (&*BBI !=
useToBringDefCloserTo))
> > +        continue;
> > +
> > +      if (&*BBI == def) {
> > +        defI = BBI;
> > +        continue;
> > +      }
> > +
> > +      if (&*BBI == useToBringDefCloserTo) {
> > +        useI = BBI;
> > +        continue;
> > +      }
> > +    }
> > +
> > +    if (defI == MBB->instr_end() || useI == MBB->instr_end())
> > +      continue;
> > +
> > +    DEBUG(dbgs() << "Splicing "; defI->dump();
dbgs() << " right
> before: ";
> > +          useI->dump(););
> > +
> > +    Changed = true;
> > +    MBB->splice(useI, MBB, defI);
> > +  }
> > +
> > +  return Changed;
> > +}
> > +
> > +static std::vector<MachineInstr *>
populateCandidates(MachineBasicBlock
> *MBB) {
> > +  std::vector<MachineInstr *> candidates;
> > +  MachineRegisterInfo &MRI =
MBB->getParent()->getRegInfo();
> > +
> > +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II)
{
> > +    MachineInstr *MI = &*II;
> > +
> > +    bool doesMISideEffect = false;
> > +
> > +    if (MI->getOperand(0).isReg()) {
> > +      const unsigned dst = MI->getOperand(0).getReg();
> > +      doesMISideEffect |=
!TargetRegisterInfo::isVirtualRegister(dst);
> > +
> > +      for (auto UI = MRI.use_begin(dst); UI != MRI.use_end(); ++UI) {
> > +        if (doesMISideEffect) break;
> > +        doesMISideEffect |= (UI->getParent()->getParent() !>
MI->getParent());
> > +      }
> > +    }
> > +
> > +    if (!MI->mayStore() && !MI->isBranch() &&
!doesMISideEffect)
> > +      continue;
> > +
> > +    DEBUG(dbgs() << "Found Candidate:  ";
MI->dump(););
> > +    candidates.push_back(MI);
> > +  }
> > +
> > +  return candidates;
> > +}
> > +
> > +void doCandidateWalk(std::vector<typedRegStackEntry>
&VRegs,
> > +                     std::queue <typedRegStackEntry>
&reg_queue,
> > +                     std::vector<MachineInstr *>
&visitedMIs,
> > +                     const MachineBasicBlock *MBB) {
> > +
> > +  const MachineFunction &MF = *MBB->getParent();
> > +  const MachineRegisterInfo &MRI = MF.getRegInfo();
> > +
> > +  while (reg_queue.size()) {
> > +    auto regType = std::get<0>(reg_queue.front());
> > +    auto reg = std::get<1>(reg_queue.front());
> > +    reg_queue.pop();
> > +
> > +    if (regType == RSE_FrameIndex) {
> > +      DEBUG(dbgs() << "Popping frame index.\n";);
> > +      VRegs.push_back(std::make_tuple(RSE_FrameIndex, ~0U));
> > +      continue;
> > +    }
> > +
> > +    assert(regType == RSE_Reg && "Expected vreg or
physreg.");
> > +
> > +    if (TargetRegisterInfo::isVirtualRegister(reg)) {
> > +      DEBUG(dbgs() << "Popping vreg ";
> > +          MRI.def_begin(reg)->dump();
> > +          dbgs() << "\n";);
> > +
> > +      bool hasVisitedVReg = false;
> > +      for (auto vreg : VRegs) {
> > +        if (std::get<1>(vreg) == reg) {
> > +          hasVisitedVReg = true;
> > +          break;
> > +        }
> > +      }
> > +
> > +      if (!hasVisitedVReg)
> > +        VRegs.push_back(std::make_tuple(RSE_Reg, reg));
> > +    } else {
> > +      DEBUG(dbgs() << "Popping physreg.\n";);
> > +      VRegs.push_back(std::make_tuple(RSE_Reg, reg));
> > +      continue;
> > +    }
> > +
> > +    for (auto RI = MRI.def_begin(reg), RE = MRI.def_end(); RI != RE;
> ++RI) {
> > +      MachineInstr *def = RI->getParent();
> > +
> > +      if (def->getParent() != MBB)
> > +        continue;
> > +
> > +      bool hasVisited = false;
> > +      for (auto VMI : visitedMIs) {
> > +        if (def == VMI) {
> > +          hasVisited = true;
> > +          break;
> > +        }
> > +      }
> > +
> > +      if (hasVisited)
> > +        break;
> > +
> > +      DEBUG(dbgs() << "\n========================\n";
> > +          dbgs() << "Visited MI: "; def->dump();
> > +          dbgs() << "BB Name: " <<
def->getParent()->getName() << "\n";
> > +          dbgs() << "\n========================\n";);
> > +      visitedMIs.push_back(def);
> > +      for (unsigned I = 1, E = def->getNumOperands(); I != E; ++I)
{
> > +
> > +        MachineOperand &MO = def->getOperand(I);
> > +        if (MO.isFI()) {
> > +          DEBUG(dbgs() << "Pushing frame index.\n";);
> > +          reg_queue.push(std::make_tuple(RSE_FrameIndex, ~0U));
> > +        }
> > +
> > +        if (!MO.isReg())
> > +          continue;
> > +        reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
> > +      }
> > +    }
> > +  }
> > +}
> > +
> > +static void SkipVRegs(unsigned &VRegGapIndex, MachineRegisterInfo
&MRI,
> > +                      const TargetRegisterClass *RC) {
> > +  const unsigned VR_GAP = (++VRegGapIndex * 1000);
> > +
> > +  DEBUG(dbgs() << "Adjusting per-BB VR_GAP for BB"
<< VRegGapIndex << "
> to "
> > +               << VR_GAP << "\n";);
> > +
> > +  unsigned I = MRI.createVirtualRegister(RC);
> > +  const unsigned E = (((I + VR_GAP) / VR_GAP) + 1) * VR_GAP;
> > +  while (I != E) {
> > +    I = MRI.createVirtualRegister(RC);
> > +  }
> > +
> > +  CANON_CHECK(I != ~0U && I != 0, "Invalid VReg");
> > +}
> > +
> > +static void GetVRegRenameMap(std::map<unsigned, unsigned>
> &vregRenameMap,
> > +                             const
std::vector<typedRegStackEntry>
> &VRegs,
> > +                             const std::vector<unsigned>
> &renamedInOtherBB,
> > +                             MachineRegisterInfo &MRI,
> > +                             const TargetRegisterClass *RC) {
> > +
> > +  unsigned lastRenameReg = MRI.createVirtualRegister(RC);
> > +
> > +  bool firstCandidate = true;
> > +  for (auto vreg : VRegs) {
> > +
> > +    auto regType = std::get<0>(vreg);
> > +    auto reg = std::get<1>(vreg);
> > +
> > +    if (regType == RSE_FrameIndex) {
> > +      lastRenameReg = MRI.createVirtualRegister(RC);
> > +      DEBUG(dbgs() << "Skipping rename for FI "
<< lastRenameReg <<
> "\n";);
> > +      continue;
> > +    } else if (regType == RSE_NewCandidate) {
> > +      if (!firstCandidate) {
> > +        while (lastRenameReg % 10) {
> > +          lastRenameReg = MRI.createVirtualRegister(RC);
> > +
> > +          DEBUG(dbgs() << "Skipping rename for new
candidate " <<
> lastRenameReg
> > +                       << "\n";);
> > +        }
> > +      }
> > +      firstCandidate = false;
> > +      continue;
> > +    } else if (!TargetRegisterInfo::isVirtualRegister(reg)) {
> > +      lastRenameReg = MRI.createVirtualRegister(RC);
> > +      DEBUG(dbgs() << "Skipping rename for Phys Reg "
<< lastRenameReg
> << "\n";);
> > +      continue;
> > +    }
> > +
> > +    if (std::find(renamedInOtherBB.begin(), renamedInOtherBB.end(),
> reg) !> > +        renamedInOtherBB.end()) {
> > +      DEBUG(dbgs() << "Vreg " << reg <<
" already renamed in other
> BB.\n";);
> > +      continue;
> > +    }
> > +
> > +    auto rename = MRI.createVirtualRegister(MRI.getRegClass(reg));
> > +    lastRenameReg = rename;
> > +
> > +    if (vregRenameMap.find(reg) == vregRenameMap.end()) {
> > +      DEBUG(dbgs() << "Mapping vreg ";);
> > +      if (MRI.reg_begin(reg) != MRI.reg_end()) {
> > +        DEBUG(auto foo = &*MRI.reg_begin(reg); foo->dump(););
> > +      } else {
> > +        DEBUG(dbgs() << reg;);
> > +      }
> > +      DEBUG(dbgs() << " to ";);
> > +      if (MRI.reg_begin(rename) != MRI.reg_end()) {
> > +        DEBUG(auto foo = &*MRI.reg_begin(rename);
foo->dump(););
> > +      } else {
> > +        DEBUG(dbgs() << rename;);
> > +      }
> > +      DEBUG(dbgs() << "\n";);
> > +
> > +      vregRenameMap.insert(std::pair<unsigned, unsigned>(reg,
rename));
> > +    }
> > +  }
> > +}
> > +
> > +static bool doVRegRenaming(std::vector<unsigned>
&renamedInOtherBB,
> > +                           const std::map<unsigned, unsigned>
> &vregRenameMap,
> > +                           MachineRegisterInfo &MRI) {
> > +  bool Changed = false;
> > +  for (auto I = vregRenameMap.begin(), E = vregRenameMap.end(); I !=
E;
> ++I) {
> > +
> > +    auto vreg = I->first;
> > +    auto rename = I->second;
> > +
> > +    renamedInOtherBB.push_back(rename);
> > +
> > +    std::vector<MachineOperand *> renameMOs;
> > +    for (MachineOperand &MO : MRI.reg_operands(vreg)) {
> > +      renameMOs.push_back(&MO);
> > +    }
> > +
> > +    for (auto MO : renameMOs) {
> > +      Changed = true;
> > +      MO->setReg(rename);
> > +
> > +      if (!MO->isDef())
> > +        MO->setIsKill(false);
> > +    }
> > +  }
> > +
> > +  return Changed;
> > +}
> > +
> > +static bool doDefKillClear(MachineBasicBlock *MBB) {
> > +  bool Changed = false;
> > +
> > +  for (auto II = MBB->begin(), IE = MBB->end(); II != IE; ++II)
{
> > +    MachineInstr *MI = &*II;
> > +
> > +    for (unsigned i = 0; i < MI->getNumOperands(); i++) {
> > +      MachineOperand &MO = MI->getOperand(i);
> > +      if (!MO.isReg())
> > +        continue;
> > +      if (!MO.isDef() && MO.isKill()) {
> > +        Changed = true;
> > +        MO.setIsKill(false);
> > +      }
> > +
> > +      if (MO.isDef() && MO.isDead()) {
> > +        Changed = true;
> > +        MO.setIsDead(false);
> > +      }
> > +    }
> > +  }
> > +
> > +  return Changed;
> > +}
> > +
> > +static bool runOnBasicBlock(MachineBasicBlock *MBB,
> > +                            std::vector<StringRef>
&bbNames,
> > +                            std::vector<unsigned>
&renamedInOtherBB,
> > +                            unsigned &basicBlockNum, unsigned
> &VRegGapIndex) {
> > +
> > +  if (CanonicalizeBasicBlockNumber != ~0U) {
> > +    if (CanonicalizeBasicBlockNumber != basicBlockNum++)
> > +      return false;
> > +    DEBUG(dbgs() << "\n Canonicalizing BasicBlock "
<< MBB->getName()
> << "\n";);
> > +  }
> > +
> > +  if (std::find(bbNames.begin(), bbNames.end(), MBB->getName())
!> > +      bbNames.end()) {
> > +    DEBUG(dbgs() << "Found potentially duplicate
BasicBlocks: "
> > +                 << MBB->getName() <<
"\n";);
> > +    return false;
> > +  }
> > +
> > +  DEBUG(
> > +    dbgs() << "\n\n  NEW BASIC BLOCK: " <<
MBB->getName() << "  \n\n";
> > +    dbgs() << "\n\n========================>
=======================\n\n";
> > +  );
> > +
> > +  bool Changed = false;
> > +  MachineFunction &MF = *MBB->getParent();
> > +  MachineRegisterInfo &MRI = MF.getRegInfo();
> > +
> > +  const TargetRegisterClass *dummyRC = [] (const MachineFunction
&MF) {
> > +    const unsigned dummyVReg = GetDummyVReg(MF);
> > +    CANON_CHECK(dummyVReg != ~0U, "Could not find a dummy
VReg.");
> > +    const MachineRegisterInfo &MRI = MF.getRegInfo();
> > +    return MRI.getRegClass(dummyVReg);
> > +  } (MF);
> > +
> > +  bbNames.push_back(MBB->getName());
> > +  DEBUG(dbgs() << "\n\n NEW BASIC BLOCK: " <<
MBB->getName() <<
> "\n\n";);
> > +
> > +  DEBUG(dbgs() << "MBB Before Scheduling:\n";
MBB->dump(););
> > +  Changed |= rescheduleCanoncally(MBB);
> > +  DEBUG(dbgs() << "MBB After Scheduling:\n";
MBB->dump(););
> > +
> > +  std::vector<MachineInstr *> candidates =
populateCandidates(MBB);
> > +  std::vector<MachineInstr *> visitedMIs;
> > +  std::copy(candidates.begin(), candidates.end(),
> > +            std::back_inserter(visitedMIs));
> > +
> > +  std::vector<typedRegStackEntry> VRegs;
> > +  for (auto candidate : candidates) {
> > +    VRegs.push_back(std::make_tuple(RSE_NewCandidate, ~0U));
> > +
> > +    std::queue<typedRegStackEntry> reg_queue;
> > +
> > +    // Here we walk the vreg operands of a non-root node along our
walk.
> > +    // The root nodes are the original candidates (stores normally).
> > +    // These are normally not the root nodes (except for the case of
> copies to
> > +    // physical registers).
> > +    for (unsigned i = 1; i < candidate->getNumOperands(); i++)
{
> > +      if (candidate->mayStore() || candidate->isBranch())
> > +        break;
> > +
> > +      MachineOperand &MO = candidate->getOperand(i);
> > +      if (!(MO.isReg() && TargetRegisterInfo::
> isVirtualRegister(MO.getReg())))
> > +        continue;
> > +
> > +      DEBUG(dbgs() << "Enqueue register"; MO.dump();
dbgs() << "\n";);
> > +      reg_queue.push(std::make_tuple(RSE_Reg, MO.getReg()));
> > +    }
> > +
> > +    // Here we walk the root candidates. We start from the 0th
operand
> because
> > +    // the root is normally a store to a vreg.
> > +    for (unsigned i = 0; i < candidate->getNumOperands(); i++)
{
> > +
> > +      if (!candidate->mayStore() &&
!candidate->isBranch())
> > +        break;
> > +
> > +      MachineOperand &MO = candidate->getOperand(i);
> > +
> > +      // TODO: Do we want to only add vregs here?
> > +      if (!MO.isReg() && !MO.isFI())
> > +        continue;
> > +
> > +      DEBUG(dbgs() << "Enqueue Reg/FI"; MO.dump();
dbgs() << "\n";);
> > +
> > +      reg_queue.push(MO.isReg() ? std::make_tuple(RSE_Reg,
MO.getReg())
> :
> > +                                  std::make_tuple(RSE_FrameIndex,
> ~0U));
> > +    }
> > +
> > +    // Now that we have pushed the operands of the candidate here, we
> do the
> > +    // full breadth first walk.
> > +    doCandidateWalk(VRegs, reg_queue, visitedMIs, MBB);
> > +  }
> > +
> > +  // If we have populated no vregs to rename then bail.
> > +  // The rest of this function does the vreg remaping.
> > +  if (VRegs.size() == 0)
> > +    return Changed;
> > +
> > +  // Skip some vregs, so we can recon where we'll land next.
> > +  SkipVRegs(VRegGapIndex, MRI, dummyRC);
> > +
> > +  std::map<unsigned, unsigned> vregRenameMap;
> > +  GetVRegRenameMap(vregRenameMap, VRegs, renamedInOtherBB, MRI,
> dummyRC);
> > +
> > +  Changed |= doVRegRenaming(renamedInOtherBB, vregRenameMap, MRI);
> > +  Changed |= doDefKillClear(MBB);
> > +
> > +  DEBUG(dbgs() << "Updated MachineBasicBlock:\n";
MBB->dump(); dbgs()
> << "\n";);
> > +  DEBUG(dbgs() << "\n\n========================>
=======================\n\n");
> > +  return Changed;
> > +}
> > +
> > +bool MIRCanonicalizer::runOnMachineFunction(MachineFunction &MF)
{
> > +
> > +  static unsigned functionNum = 0;
> > +  if (CanonicalizeFunctionNumber != ~0U) {
> > +    if (CanonicalizeFunctionNumber != functionNum++)
> > +      return false;
> > +    DEBUG(dbgs() << "\n Canonicalizing Function "
<< MF.getName() <<
> "\n";);
> > +  }
> > +
> > +  // we need a valid vreg to create a vreg type for skipping all
those
> > +  // stray vreg numbers so reach alignment/canonical vreg values.
> > +  std::vector<MachineBasicBlock*> RPOList = GetRPOList(MF);
> > +
> > +  DEBUG(
> > +    dbgs() << "\n\n  NEW MACHINE FUNCTION: " <<
MF.getName() << "
> \n\n";
> > +    dbgs() << "\n\n========================>
=======================\n\n";
> > +    dbgs() << "Total Basic Blocks: " <<
RPOList.size() << "\n";
> > +    for (auto MBB : RPOList) {
> > +      dbgs() << MBB->getName() << "\n";
> > +    }
> > +    dbgs() << "\n\n========================>
=======================\n\n";
> > +  );
> > +
> > +  std::vector<StringRef> bbNames;
> > +  std::vector<unsigned> renamedInOtherBB;
> > +
> > +  unsigned gapIdx = 0;
> > +  unsigned bbNum = 0;
> > +
> > +  bool Changed = false;
> > +
> > +  for (auto MBB : RPOList)
> > +    Changed |= runOnBasicBlock(MBB, bbNames, renamedInOtherBB, bbNum,
> gapIdx);
> > +
> > +  return Changed;
> > +}
> > +
> > diff --git a/tools/mir-canon/mir-canon.cpp
b/tools/mir-canon/mir-canon.
> cpp
> > new file mode 100644
> > index 00000000000..bfe53c01906
> > --- /dev/null
> > +++ b/tools/mir-canon/mir-canon.cpp
> > @@ -0,0 +1,175 @@
> > +//===------ mir-canon.cpp - Implement the LLVM Native Code Generator
> ------===//
> > +//
> > +//                     The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open
Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +//
> > +// This is the mir-canon driver. It invokes the MIR canonicalization
> pass.
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +
> > +#include "llvm/CodeGen/CommandFlags.h"
> > +#include "llvm/CodeGen/TargetPassConfig.h"
> > +#include "llvm/CodeGen/MachineModuleInfo.h"
> > +#include "llvm/CodeGen/MachineFunctionPass.h"
> > +#include "llvm/CodeGen/MIRParser/MIRParser.h"
> > +
> > +#include "llvm/IR/LLVMContext.h"
> > +#include "llvm/IR/DiagnosticInfo.h"
> > +#include "llvm/IR/IRPrintingPasses.h"
> > +#include "llvm/IR/DiagnosticPrinter.h"
> > +#include "llvm/IR/LegacyPassManager.h"
> > +
> > +#include "llvm/Support/Signals.h"
> > +#include "llvm/Support/FileSystem.h"
> > +#include "llvm/Support/TargetSelect.h"
> > +#include "llvm/Support/TargetRegistry.h"
> > +#include "llvm/Support/ToolOutputFile.h"
> > +#include "llvm/Support/FormattedStream.h"
> > +#include "llvm/Support/PrettyStackTrace.h"
> > +
> > +#include "llvm/Target/TargetMachine.h"
> > +#include "llvm/Target/TargetSubtargetInfo.h"
> > +
> > +#include "Common.h"
> > +
> > +using namespace llvm;
> > +
> > +static cl::opt<std::string>
> > +    InputFilename(cl::Positional,
cl::desc("<input.mir>"),
> cl::init("-"));
> > +
> > +static cl::opt<std::string> OutputFilename("o",
cl::desc("Output
> filename"),
> > +                                          
cl::value_desc("canon.mir"));
> > +
> > +static cl::opt<std::string> TargetTriple("mtriple",
> > +                                         cl::desc("Override
target
> triple."));
> > +
> > +int main(int argc, char **argv) {
> > +  sys::PrintStackTraceOnErrorSignal(argv[0]);
> > +  PrettyStackTraceProgram X(argc, argv);
> > +
> > +  EnableDebugBuffering = true;
> > +
> > +  LLVMContext Context;
> > +  llvm_shutdown_obj Y;
> > +
> > +  // Initialize targets first, so that --version shows registered
> targets.
> > +  InitializeAllTargets();
> > +  InitializeAllTargetMCs();
> > +  InitializeAllAsmPrinters();
> > +  InitializeAllAsmParsers();
> > +
> > +  PassRegistry *Registry = PassRegistry::getPassRegistry();
> > +  initializeCore(*Registry);
> > +  initializeCodeGen(*Registry);
> > +
> > +  initializeMIRCanonicalizerPass(*Registry);
> > +
> > +  // Register the target printer for --version.
> > +  cl::AddExtraVersionPrinter(TargetRegistry::
> printRegisteredTargetsForVersion);
> > +
> > +  cl::ParseCommandLineOptions(argc, argv, "llvm system
compiler\n");
> > +
> > + 
CANON_CHECK(StringRef(InputFilename).endswith_lower(".mir"),
> > +              "Invalid Filename: " + InputFilename +
". " +
> > +              "Expected a .mir file extension.");
> > +
> > +  // Set a diagnostic handler that doesn't exit on the first
error
> > +  bool HasError = false;
> > +  Context.setDiagnosticHandler(
> > +    [](const DiagnosticInfo &DI, void *Context) {
> > +      if (DI.getSeverity() == DS_Error) {
> > +        bool *HasError = static_cast<bool *>(Context);
> > +        *HasError = true;
> > +      }
> > +
> > +      auto *Remark =
dyn_cast<DiagnosticInfoOptimizationBase>(&DI);
> > +      if (Remark && !Remark->isEnabled())
> > +        return;
> > +
> > +      DiagnosticPrinterRawOStream DP(errs());
> > +      errs() << LLVMContext::getDiagnosticMessagePrefix(DI.
> getSeverity());
> > +      errs() << ": ";
> > +      DI.print(DP);
> > +      errs() << "\n";
> > +    },
> > +    &HasError);
> > +
> > +  SMDiagnostic Err;
> > +  auto MIR = createMIRParserFromFile(InputFilename, Err, Context);
> > +  auto M = MIR ? MIR->parseIRModule() : nullptr;
> > +
> > +  CANON_CHECK(M, argv[0] + "Invalid Module or MIR.");
> > +
> > +  if (!TargetTriple.empty())
> > +    M->setTargetTriple(Triple::normalize(TargetTriple));
> > +
> > +  Triple TheTriple = Triple(M->getTargetTriple());
> > +  if (TheTriple.getTriple().empty())
> > +    TheTriple.setTriple(sys::getDefaultTargetTriple());
> > +
> > +  // Get the target specific parser.
> > +  std::string Error;
> > +  const Target *TheTarget > > +     
TargetRegistry::lookupTarget(MArch, TheTriple, Error);
> > +
> > +  CANON_CHECK(TheTarget, argv[0] + ": " + Error);
> > +
> > +  std::string CPUStr = getCPUStr();
> > +  std::string FeaturesStr = getFeaturesStr();
> > +
> > +  std::unique_ptr<TargetMachine>
Target(TheTarget->createTargetMachine(
> > +      TheTriple.getTriple(), CPUStr, FeaturesStr,
> > +      InitTargetOptionsFromCodeGenFlags(), getRelocModel(),
> getCodeModel(),
> > +      CodeGenOpt::Default));
> > +
> > +  CANON_CHECK(Target, "Could not allocate target
machine!");
> > +
> > +  std::error_code EC;
> > +  auto Out =
llvm::make_unique<tool_output_file>(OutputFilename, EC,
> > +                                                 sys::fs::F_Text);
> > +  CANON_CHECK(!EC, EC.message());
> > +  raw_pwrite_stream *OS = &Out->os();
> > +
> > +  legacy::PassManager PM;
> > +  TargetLibraryInfoImpl TLII(Triple(M->getTargetTriple()));
> > +  PM.add(new TargetLibraryInfoWrapperPass(TLII));
> > +  M->setDataLayout(Target->createDataLayout());
> > +  setFunctionAttributes(CPUStr, FeaturesStr, *M);
> > +
> > +  LLVMTargetMachine &TM = static_cast<LLVMTargetMachine
&>(*Target);
> > +  TargetPassConfig &TPC = *TM.createPassConfig(PM);
> > +  PM.add(&TPC);
> > +  MachineModuleInfo *MMI = new MachineModuleInfo(&TM);
> > +  PM.add(MMI);
> > +  TPC.printAndVerify("");
> > +
> > +  StringRef PassName = "mir-canonicalizer";
> > +  const PassRegistry *PR = PassRegistry::getPassRegistry();
> > +  const PassInfo *PI = PR->getPassInfo(PassName);
> > +
> > +  CANON_CHECK(PI, argv[0] + ": " + PI->getPassName() +
" not
> registered.");
> > +  CANON_CHECK(PI->getNormalCtor(),
> > +              argv[0] + ": can't create pass: " +
PI->getPassName());
> > +
> > +  Pass *P = PI->getNormalCtor()();
> > +  PM.add(P);
> > +  TPC.printAndVerify(std::string("After ") +
> std::string(P->getPassName()));
> > +
> > +  PM.add(createPrintMIRPass(*OS));
> > +
> > +  // Before executing passes, print the final values of the LLVM
> options.
> > +  cl::PrintOptionValues();
> > +
> > +  PM.run(*M);
> > +
> > +  CANON_CHECK(!(*static_cast<bool
*>(Context.getDiagnosticContext())),
> > +              argv[0] + " Error: exiting...");
> > +
> > +  Out->keep();
> > +  return 0;
> > +}
> > +
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/b1e1642c/attachment-0001.html>

Matthias Braun via llvm-dev

2017-Sep-06 22:36 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

This would be easier to discuss on llvm-commits and phabricator.
> On Aug 22, 2017, at 2:09 PM, Puyan Lotfi via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> Patch for review.
> 
> 
> 
> 
> On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com <mailto:puyan.lotfi.llvm at gmail.com>> wrote:
> Ping.
> 
> Still working on preparing code for review. Will have a patch for review
ready in the coming days.
> 
> PL
> 
> On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com <mailto:puyan.lotfi.llvm at gmail.com>> wrote:
> Hi,
> 
> My name is Puyan and I've been exploring ways to improve the state of
instruction level diffing using llvm and MIR. Below is a proposal for a new llvm
tool meant to address issues encountered when diffing at the machine level.
I'm eager to hear the community's feedback.
> 
> Thanks
> 
> PL
> 
> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
> 
> 
> Problem Statement and Context:
> 
> Diff-tools are regularly used for comparing IR and assembly files. This
often involves reasoning through differences that are semantically equivalent
and it can be very time consuming for a person to do said reasoning.
> 
> Specifically in the context of GlobalISel development there are problems of
correctness verification. There is a need to compare two programs, compiled from
identical IR by two different instruction selectors in a way where the true
differences stand out.Sounds interesting!
I hope some people involved in GlobalISel will chime in the discussion about the
patch.
> 
> 
> Proposal:
> 
> We propose a new tool that we have tentatively named 'mir-canon'
that performs canonical transformations on MIR. The goal is for MIR
pre-processed with mir-canon to show fewer differences than if it were not
pre-processed.
> 
> At the time of this writing we have a prototype canonicalization tool.
We’ve come up with some techniques that show promise and would like to open
discussion with the community to get feedback and suggestions on refining said
techniques. Currently we think of this as an open ended project.
> 
> 
> Techniques:
> 
> Our prototype does the following for each basic block in a Reverse Post
Ordering:
> 
> Canonical instruction ordering is done by moving a given instruction as
close to the nearest use of its def as possible.From glancing at the patch this only seems to respect vreg dependencies but I
couldn't see memory / side effect tracking (ScheduleDAGInstrs may be the
right tool for the job).
> 
> Next, canonical VReg renaming is done by building a collection of candidate
instructions that can be thought of as sinks in the def-use graph: they are
typically instructions that write to physical registers or store to memory.
These candidates are used as the root of a breadth first walk over the vreg
operand def-use graph that determines a canonical vreg ordering.
> 
> Using said canonical vreg ordering we rename monotonically, but before we
do this we skip several vreg values in order to increase the chance that we land
on the same vreg number for two different input MIR files. We also do this to
reduce the chances that a difference in previously def-use walks will affect the
vreg renaming for subsequent walks. This skipping step could be thought of as a
kind of vreg number reckoning: we skip modulo n vregs so that we are likely to
land on the same vreg for two different files.It may be interesting to extend the .mir file format here to allow arbitrary
names for vregs.

So instead of

block1:
%0 = foo
%1 = bar
   use %1

block2:
%100 = baz

you could do something along the lines of:

block1:
  %1_foo0 = foo
  %1_bar1 = bar
         use %1_bar1

block2:
  %2_baz = baz

That way you could avoid the vreg numbering games and by encoding the
instruction type you are even robust against insertion/removal of instructions
of different types.

I'd certainly be open to some mir extension patches.

- Matthias

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/a2f20f72/attachment.html>

Puyan Lotfi via llvm-dev

2017-Sep-07 18:21 UTC

head link

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

On Wed, Sep 6, 2017 at 3:36 PM, Matthias Braun <mbraun at apple.com>
wrote:
> This would be easier to discuss on llvm-commits and phabricator.
>
> On Aug 22, 2017, at 2:09 PM, Puyan Lotfi via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> Patch for review.
>
>
>
>
> On Mon, Aug 21, 2017 at 11:45 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
> wrote:
>
>> Ping.
>>
>> Still working on preparing code for review. Will have a patch for
review
>> ready in the coming days.
>>
>> PL
>>
>> On Tue, Aug 15, 2017 at 12:06 PM Puyan Lotfi <puyan.lotfi.llvm at
gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My name is Puyan and I've been exploring ways to improve the
state of
>>> instruction level diffing using llvm and MIR. Below is a proposal
for a new
>>> llvm tool meant to address issues encountered when diffing at the
machine
>>> level. I'm eager to hear the community's feedback.
>>>
>>> Thanks
>>>
>>> PL
>>>
>>> mir-canon: A new tool for canonicalizing MIR for cleaner diffing.
>>>
>>>
>>> Problem Statement and Context:
>>>
>>> Diff-tools are regularly used for comparing IR and assembly files.
This
>>> often involves reasoning through differences that are semantically
>>> equivalent and it can be very time consuming for a person to do
said
>>> reasoning.
>>>
>>> Specifically in the context of GlobalISel development there are
problems
>>> of correctness verification. There is a need to compare two
programs,
>>> compiled from identical IR by two different instruction selectors
in a way
>>> where the true differences stand out.
>>>
>> Sounds interesting!
> I hope some people involved in GlobalISel will chime in the discussion
> about the patch.
>
I'm hoping to get them using it once things get checked in and are
available in all our branches.
>
>>>
>>> Proposal:
>>>
>>> We propose a new tool that we have tentatively named
'mir-canon' that
>>> performs canonical transformations on MIR. The goal is for MIR
>>> pre-processed with mir-canon to show fewer differences than if it
were not
>>> pre-processed.
>>>
>>> At the time of this writing we have a prototype canonicalization
tool.
>>> We’ve come up with some techniques that show promise and would like
to open
>>> discussion with the community to get feedback and suggestions on
refining
>>> said techniques. Currently we think of this as an open ended
project.
>>>
>>>
>>> Techniques:
>>>
>>> Our prototype does the following for each basic block in a Reverse
Post
>>> Ordering:
>>>
>>>
>>>    - Canonical instruction ordering is done by moving a given
>>>    instruction as close to the nearest use of its def as possible.
>>>
>>> From glancing at the patch this only seems to respect vreg
dependencies
> but I couldn't see memory / side effect tracking (ScheduleDAGInstrs may
be
> the right tool for the job).
>
>I've currently not gotten to that. At the moment, I treat memory and
physical register side effects as canonicalization barriers. Doing a side
effect tracker sounds like a good next step.

>
>>>
>>>    - Next, canonical VReg renaming is done by building a collection
of
>>>    candidate instructions that can be thought of as sinks in the
def-use
>>>    graph: they are typically instructions that write to physical
registers or
>>>    store to memory. These candidates are used as the root of a
breadth first
>>>    walk over the vreg operand def-use graph that determines a
canonical vreg
>>>    ordering.
>>>
>>>
>>>
>>>    - Using said canonical vreg ordering we rename monotonically,
but
>>>    before we do this we skip several vreg values in order to
increase the
>>>    chance that we land on the same vreg number for two different
input MIR
>>>    files. We also do this to reduce the chances that a difference
in
>>>    previously def-use walks will affect the vreg renaming for
subsequent
>>>    walks. This skipping step could be thought of as a kind of vreg
number
>>>    reckoning: we skip modulo n vregs so that we are likely to land
on the same
>>>    vreg for two different files.
>>>
>>> It may be interesting to extend the .mir file format here to allow
> arbitrary names for vregs.
>
> So instead of
>
> block1:
> %0 = foo
> %1 = bar
>    use %1
>
> block2:
> %100 = baz
>
> you could do something along the lines of:
>
> block1:
>   %1_foo0 = foo
>   %1_bar1 = bar
>          use %1_bar1
>
> block2:
>   %2_baz = baz
>
> That way you could avoid the vreg numbering games and by encoding the
> instruction type you are even robust against insertion/removal of
> instructions of different types.
>
> I'd certainly be open to some mir extension patches.
>

That would actually be pretty great because currently the compile time
suffers a lot from generating all those skipped vregs.
I will certainly follow up with you as a second phase of this in order to
speed things up.


I will send a revised patch to the commits list as soon as I can.

PL

>
> - Matthias
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/3e0292d5/attachment-0001.html>

llvm dev - Aug 2017 - [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.

[llvm-dev] [RFC] mir-canon: A new tool for canonicalizing MIR for cleaner diffing.