thr3ads.net - llvm dev - [LLVMdev] [global-isel] Simplifying the simplifier [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2013-Aug-08 23:18 UTC

[LLVMdev] [global-isel] Proposal for a global instruction selector

I am hoping that this proposal will generate a lot of feedback, and there are
many different topics to discuss. When replying to this email, please change the
subject header to something more specific, but keep the [global-isel] tag.

Thanks,
/jakob

Proposal for a Global Instruction Selector

It is becoming evident that we need a replacement for the SelectionDAG-based
instruction selector. This proposal describes the architecture of a global
instruction selector based on the MachineInstr intermediate representation.

A new instruction selector is a large project that will probably take a couple
of years to complete. This proposal only describes the desired final design, it
does not describe the path of incremental development we should follow to get
there.

This is a strawman design which shouldn't be seen as a final proposal. I am
hoping for a lot of community feedback, and I don't expect we'll have a
final design before we actually begin implementing it.

Why not legalize LLVM IR?

It's been proposed to legalize LLVM IR before instruction selection, see
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061567.html. I recommend
reading the mailing list thread debating the proposal, but let me summarize the
reasons for legalizing MI instead of LLVM IR:

ABI boundaries have a too high-level representation in LLVM IR. By ABI
boundaries, I mean function call and return, landing pads, and inline assembly.

LLVM IR doesn't make it clear which function arguments go in registers and
which go on the stack. Stack frame and call frame maintenance instructions are
not represented in LLVM IR. Byval arguments are a big mystery, and some
arguments are actually passed in multiple registers, depending on their type.

While LLVM IR could be extended to represent all these things, it would pollute
the clean design of LLVM IR, and it is not the right level of abstraction for
this IR.

The getelementpointer instruction is too abstract. We want pointer arithmetic to
be exposed as normal integer arithmetic in a type system without pointers. This
would require a lot of awkward pointer-to-integer conversions in LLVM IR.

The LLVM IR type system is too high-level. We would need to get rid of
first-class aggregates and pointers. This is possible, but it would require lots
of bitcasts and other noisy instructions that are really no-ops as far as
instruction selection is concerned.

We want LLVM IR to be readable by future versions of the compiler, and we desire
that target authors can rearrange their design as the target evolves. Requiring
target authors to autoupgrade old idioms (e.g., the equivalent of X86ISD nodes)
isn't a reasonable thing.

Motivation

Everybody loves to hate SelectionDAG, but it is still useful to make its
shortcomings explicit. These are some of the goals for a new instruction
selector architecture:

We want a global instruction selector.

SelectionDAG operates on a basic block at a time, and we have been forced to
implement a number of hacks to work around that. For example, most of
CodeGenPrepare is moving instructions around to make good local instruction
selection more likely. Legalization of switches and selects must be done either
before or after instruction selection because it requires creating new basic
blocks.

A number of passes running after instruction selection are also mostly about
cleaning up after the single-basic-block selector. This includes MachineCSE,
MachineLICM, and the peephole pass.

We want a faster instruction selector.

The SelectionDAG process is quite heavyweight because it uses continuous CSE, a
whole new IR, and a mandatory scheduling phase to linearize the DAG. By
selecting directly to MI, we can avoid one IR translation phase. By using a
linearized IR, scheduling becomes optional.

We want a shared code path for fast and good instruction selection.

Currently, the fast instruction selector used for -O0 builds is a completely
separate code path. This is not healthy because it increases the likelihood of
bugs in the fast path that were not present in the slow path.

It would be better if the -O0 instruction selector were a trimmed down version
of the full instruction selector.

We want an IR that represents ISA concepts better.

The SelectionDAG IR is very good at representing LLVM IR directly, but as the
code is lowered to model target machine concepts, weird hacks are often
required. This is evident in the way too many SDNodes required to represent a
function call, or the many custom ISD nodes that targets need to define.

In many cases, custom target code knows exactly which instructions it wants to
produce, and the IR should make it possible and easy to just emit the desired
instructions directly. The MI intermediate representation isn't perfect
either, and we should plan some MI improvements as well.

The SelectionDAG concept of legal types and their mapping to a single register
class often causes problems. In some cases, it is necessary to lie about value
types, just to get the instruction selector to do the right thing.

We want a more configurable instruction selector.

Weird targets have weird requirements, and it should be possible for targets to
inject new passes into the instruction selection process. Sometimes, it may even
be required to replace a standard pass.

Overall design

The global instruction selector is implemented as a series of machine function
passes. Targets can inject their own passes or replace standard passes, using
the existing mechanism for configuring code generation pipelines.

The MI intermediate representation is extended with a set of abstract
target-independent opcodes very similar to the ISD opcodes used by SelectionDAG.
Virtual registers can be typed so they have an EVT instead of a register class.

The standard passes are:

MI builder.
Switch lowering.
Iterative legalization and cleanup.
Register bank selection.
Common subexpression elimination.
Instruction selection.
I'll describe these passes in more detail below.

MI builder

The MI builder pass translates LLVM IR to MI IR, much like the current
SelectionDAGBuilder phase. Like SelectionDAGBuilder, this pass will:

Expand LLVM IR first-class aggregate types into their constituent parts. This
also includes splitting load, store, phi, and select instructions.
Replace pointer types with target-specific integers.
Expand getelementptr instructions with integer adds and multiplies.
Map LLVM instructions to target-independent abstract MI opcodes.
Lower ABI boundaries with help from target hooks.
Unlike SelectionDAGBuilder, this pass will:

Create a 1-1 mapping of the LLVM IR CFG. No new basic blocks are created.
Preserve switch instructions.
Preserve i1 logic instructions.
Not use MERGE_VALUES instructions. We'll use a value map that can handle
aggregates.
The aggregate type expansion creates value types that can all be represented by
EVTs, and MachineRegisterInfo will be extended to allow virtual registers to
have an EVT instead of a register class. EVTs are all the integer, floating
point, and vector types from LLVM IR.

The ABI boundary lowering requires types to be broken down further into
'legal types' that can be mapped to registers. The secondary breakdown
is currently handled by TargetLowering::LowerCallTo() calling getRegisterType()
and getNumRegisters(). Most ABIs are defined in terms of C types, not LLVM IR
types, so there is a close connection between the C frontend and the ABI
lowering code in the instruction selector. It would be a good idea to have the
ABI lowering code work independently of the type system used during instruction
selection. This is made possible by having ABI lowering be part of the LLVM IR
to MI translation process.

Switch lowering

SelectionDAGBuilder is lowering switches immediately because the CFG can't
be changed during instruction selection. This isn't required with a global
instruction selector, so switch lowering can be its own pass, simplifying the
design of the MI builder.

The switch lowering pass converts switch instructions into a combination of
branch trees and jump tables. A default target-independent implementation is
possible, but targets may want to override this pass. For example, the ARM
target could try to create jump tables that would work well with the TBB/TBH
instructions to reduce code size.

It is convenient to lower switches in the MI intermediate representation because
it already has an object type for jump tables.

Legalization

SelectionDAG's concept of legal types isn't clearly defined. It seems to
be a bit random which operations must be supported before a type can be
considered legal. We'll define legal types precisely:

A type is considered legal if it can be loaded into and stored from an
allocatable register class. (And all allocatable register classes must support
copy instructions.)

We don't require other supported operations than load and store for a type
to be legal, and all types that can be loaded and stored are legal. This means
that most targets will have a much larger set of legal types than they do today.
Also note that we don't require targets to designate a register class to use
for each legal type; in fact, TableGen can compute the legal type set
automatically. Register classes can be inferred from the selected
instructions,MachineRegisterInfo::recomputeRegClass() already knows how to do
that.

With a much larger set of legal types, a separate type legalization phase
becomes superfluous. The operation legalizer must be able to do anything the
type legalizer can do anyway, so the type legalizer isn't adding any value.

The legalizer will work bottom-up and iteratively. As illegal instructions are
visited, the legalizer will apply transformations that make the current
instruction 'more legal'. Each transformation is allowed to modify a
whole chain of single-use instructions for efficiency, but it is also allowed to
only modify the current instruction in hard cases. The instructions created
don't need to be legal, the legalizer iterates until the current instruction
is legal before it moves on.

The set of new instructions created while legalizing a single instruction is fed
through an instruction simplifier that cleans up redundancies. This replaces
DAGCombine.

Clearly, more details are required here, and community input would be much
appreciated. Some specific questions that need answering are:

What does 'more legal' mean? Can we restrict the possible legalization
transformations so the iterative process is guaranteed to make progress?

A long time ago, SelectionDAG had a single legalization pass. It was broken into
multiple passes because the code got too complicated. Is the iterative approach
proposed here good enough to keep the code complexity under control?

In my opinion, it's the value mapping that gets you. If the legalization can
be broken into small well-defined transformations with no shared state, such as
a value map, then there is no reason for the code to become complicated.

Compare this to the live range splitting implemented in the register allocator.
Splitting a live range can be fairly complicated, but it is implemented as a
transaction that transforms valid IR to valid IR. We no longer have the side
maps with deferred spill code and so on; there is no shared state between live
range splits other than the IR. This makes the problem manageable.

DAGCombine is supposed to clean up legalization products, but today it seems to
be accumulating some transformations that probably belong in InstCombine. Since
it is running so late in the pass pipeline, its canonicalizing approach is
causing problems for targets that are trying to arrange the IR optimally.
Optimal and canonical are not always the same thing.

Can we get away with a gentler instruction simplifier that is more focused at
cleaning up after legalization?

Does it still work if we constrain the instruction simplifier to creating legal
operations?

Register bank selection

Many instruction set architectures have multiple register banks. X86 has three:
Integer, vector, and x87 floating point registers. (Four if you count MMX
registers as a separate bank.) Blackfin and m68k have separate pointer and data
register banks, etc. It is also common to have the same operations available in
multiple register banks. For example, most ISAs with a vector register bank
support bitwise and/or/xor operations in both the integer and vector register
banks.

SelectionDAG is using the type system to select register banks; each legal type
is mapped to a single register class. This mapping is causing a lot of strange
modeling problems in targets, and it often leads to spurious cross-class copies
with values bouncing back and forth between register banks. See for example the
strange occurrences of v1i64 in the ARM target. It is also very difficult to
make efficient use of operations that are available in multiple register banks.

The global instruction selector will assign virtual registers to register banks
explicitly, not by using the type system. The set of register banks is typically
small (2-3) and defined by the target. Modelling register banks explicitly makes
it possible to move operations between register banks to minimize cross-bank
copies which are often expensive. SPARC even requires cross-bank copies to go
through memory, as does x86 in some cases.

The register bank selection pass computes the optimal bank assignment and
inserts copy instructions when a value needs to cross banks. Sometimes, it may
be beneficial to have the same value available in two register banks
simultaneously, this can also be represented with cross-bank copy instructions.
The bank selection can also be affected by register pressure concerns. On
x86-64, for example, many i32 values could be moved to the SSE registers,
freeing up the integer registers.

The interaction between legalization and bank selection is not clear yet. For
example, i64 would be a legal type on ARM with and/or/xor/add/sub operations
supported in a D-register. Some i64 operations are not supported in D-registers,
and it would be necessary to expand those i64 operations to i32 operations in
the GPR bank. When that happens, it is most likely preferable to
'legalize' more connected i64 operations so they can be moved to the GPR
bank as well.

Instruction selection

The instruction selection pass replaces most target-independent instructions
with real target opcodes, and it ensures that all virtual registers have
register classes instead of EVTs. Some target-independent instructions, like
COPY, are not lowered until after register allocation.

SelectionDAG instruction selection is controlled only by expression types, and
the selected instructions are expected to use the unique register banks selected
by the type:

(operation, type) --> opcode
We're representing register banks explicitly, and many operations are
available in multiple banks, so the register bank needs to be part of the
instruction selection:

(operation, type, bank) --> opcode
On the other hand, when types are not used to select register banks, it becomes
really difficult to explain the difference between load i32 and load f32. The
hardware doesn't care either, it simply knows how to load 32 bits into a
given register. We can use a three-level hierarchical type system to better
describe this:

Bit width. Operations like load, store, select, and the bitwise and/or/xor only
depend on the number of bits in the operands. Their effect is independent of the
vector lane structure and whether lanes are ints or floats.

Vector lanes + lane width. Operations like shufflevector and insertelement
depend of the vector topology of the operand types, but they don't care if
vector lanes are floats or ints.

Full EVTs. Finally, arithmetic instructions actually depend on the full type of
their operands. It is worth noting, though, that the int/float distinction is
already redundantly encoded in the operations. LLVM IR has separate add and fadd
instructions, for example.

The instruction selection will only use the relevant parts of the operand type,
depending on the operation being matched. It will consider load i32, load f32,
and load v2i16 to be simply 32-bit wide loads. The target is likely to have
multiple 32-bit load instructions. The explicit register bank is used to pick
the right one.

Apart from the way operations and types are matched, the instruction selection
algorithm is a lot like the current SelectionDAG algorithm. The process is less
transactional than it is in SelectionDAG. Specifically, targets are allowed to
insert pre-selected instructions at any time.

The instruction selection algorithm is global, which means it can look upwards
in the dominator tree when matching patterns. A cost model is required to
determine when it is a good idea to fold instructions outside the current loop.
The same applies to matching instructions with multiple uses.

Ramifications

This proposal goes a lot further than simply implementing the same instruction
selector on a new intermediate representation. It's worthwhile to point out
some of the effects of the proposed design.

More legal types

The new definition of a legal type is very permissive, and there are going to be
many more legal types in most targets. It is also worth noting that the legality
of a type is a function of the type's bit size only. In other words, if f64
is a legal type, so is i64, v2f32, and even v64i1. On the ARM target, for
example, these types would be legal:

All 8-bit types via ldrb/strb to GPR. (i8, v1i8, v2i4, v4i2, v8i1)
All 16-bit types via ldrh/strh to GPR. (i16, f16, v1i16, v2i8, ...)
All 32-bit types via ldr/str to GPR and vldr/vstr to SPR.
All 64-bit types via ldrd/strd to GPRPair and vldr/vstr to DPR.
All 128-bit types via vld1/vst1 to DPair.
All 192-bit types via vld1/vst1 to DTriple.
All 256-bit types via vld1/vst1 to DQuad.
This larger set of legal types also makes it easier to handle things like
extractelement <8 x i8> which currently produces an illegal type and is
thus obfuscated by the type legalizer.

An i8 live range could exist in an ARM function as long as the value is only
loaded and stored. We could even use a register class with a single-byte spill
size. Different spill sizes for the same registers is already supported. That is
how the x86 target holds 32-bit floats in the 128-bit xmm registers. If the i8
value has any illegal arithmetic instructions, the legalizer will expand it to
i32, and the expansion should probably be propagated to other basic blocks and
threaded through phis.

Most of CodeGenPrepare can go away

The instruction selector will be able to match patterns across basic blocks by
looking up the dominator tree. This means that CodeGenPrepare no longer needs to
sink GEPs and compares down to their uses. CodeGenPrepare has a lot of
addressing mode matching code which can simply be deleted.

Other transformations that currently live in CodeGenPrepare could probably be
moved to the MI representation, although it isn't quite clear if we would
want to do that.

Better modeling of ISA concepts

Try compiling this function for SPARC64:

float f(int *p) {
  return *p;
}
It's a simple load and float-to-int conversion which produces this IR:

define float @f(i32* nocapture %p) {
entry:
  %l = load i32* %p, align 4
  %conv = sitofp i32 %l to float
  ret float %conv
}
SelectionDAG generates this SPARC64 assembly:

ld [%i0], %l0
st %l0, [%fp+2043]
ld [%fp+2043], %f0
fitos %f0, %f1
The integer value %l is first loaded into an integer register (%l0) and then
bitcasted to a float so it can be used by the fitos instruction which reads an
integer in%f0 and writes its float result to %f1. The SPARC target is
essentially creating a DAG representing this IR:

define float @f(i32* nocapture %p) {
entry:
  %l = load i32* %p, align 4
  %l2 = bitcast i32 %l to float
  %conv = sitofp float %l2 to float
  ret float %conv
}
The bitcast instruction really represents a cross-bank register copy, and the
sitofp instruction no longer typechecks; it really does read an integer from a
floating point register, but SelectionDAG has no way of representing that. The
type labels are no longer representing types, they are purely used to select
register banks. The %l2 is not a float, and it isn't being interpreted as a
float by the fitos instruction. It's an integer in the floating point
register bank.

By modeling register banks better, we don't have to lie to the instruction
selector or depend on its type checking being defunct, and we get better results
for free. Initially, the MI builder would produce code like this, with default
register banks labels derived from value types:

define float @f(i32* nocapture %p) {
entry:
  %l:IntBank = load i32* %p, align 4
  %conv:FloatBank = sitofp i32 %l to float
  ret float %conv
}
The legalizer recognizes that SPARC only supports sitofp in the FloatBank
register banks, so it inserts a cross-bank copy:

define float @f(i32* nocapture %p) {
entry:
  %l:IntBank = load i32* %p, align 4
  %l2:FloatBank = COPY %l
  %conv:FloatBank = sitofp i32 %l2 to float
  ret float %conv
}
Finally, the bank selection pass can eliminate the cross-bank copy by moving the
%l value to the floating point register bank:

define float @f(i32* nocapture %p) {
entry:
  %l:FloatBank = load i32* %p, align 4
  %conv:FloatBank = sitofp i32 %l to float
  ret float %conv
}
Now we have an i32 load into the floating point register bank which is fine, the
instruction selector simply has a pattern that matches any 32-bit load targeting
the floating point bank, and we get this code:

ld [%i0], %f0
fitos %f0, %f1
More tuned targets like ARM already has peephole patterns that handle code like
this, but it shouldn't be necessary to manually fix all these cases. The ISA
fully supports integers in the 'floating point' register banks, so the
instruction selector should be able to model that as a first class concept.

Fragile legal types

The new model allows legal types with very few legal operations, and this
creates extra challenges for the legalizer. Some legal types are
'fragile' in the sense that it can be beneficial to split even legal
operations to avoid too many cross bank copies. Consider this ARM program:

void dot(uint64_t *a, uint64_t *p, uint64_t *q, unsigned n) {
  uint64_t s = 0;
  while (n--)
    s += *p++ * *q++;
  *a = s;
}
which produces this IR after the MI builder has expanded GEPs: (I am borrowing
LLVM IR syntax to describe the MI IR.)

define void @dot(i32 %a, i32 %p, i32 %q, i32 %n) {
entry:
  %sum0:i64,VecBank = const i64 0
  %cmp1 = icmp eq i32 %n, 0
  br i1 %cmp1, label %exit, label %loop

loop:
  %iv:i32,IntBank = phi i32 [ %iv2, %loop ], [ %n, %entry ]
  %p1:i32,IntBank = phi i32 [ %p2, %loop ], [ %p, %entry ]
  %q1:i32,IntBank = phi i32 [ %q2, %loop ], [ %q, %entry ]
  %sum1:i64,VecBank = phi i64 [ %sum2, %loop ], [ %sum0, %entry ]
  %iv2:i32,IntBank = add i32 %iv, -1
  %p2:i32,IntBank = add i32 %p1, i32 8
  %q2:i32,IntBank = add i32 %q1, i32 8
  %lp:i64,VecBank = load %p1, align 4
  %lq:i64,VecBank = load %q1, align 4
  %prod:i64,VecBank = mul i64 %lp, %lq
  %sum2:i64,VecBank = add i64 %prod, %sum1
  %cmp2 = icmp eq i32 %iv, 0
  br i1 %cmp2, label %exit, label %entry

exit:
  %s:i64,VecBank = phi i64 [ %sum0, %entry ], [ %sum2, %loop ]
  store i64 %s, i32 %a, align 4
  ret void
}
These instructions are all legal on ARM, except for the mul i64 instruction
which the legalizer will split and move to the integer register bank:

define void @dot(i32 %a, i32 %p, i32 %q, i32 %n) {
entry:
  %sum0:i64,VecBank = const i64 0
  %cmp1 = icmp eq i32 %n, 0
  br i1 %cmp1, label %exit, label %loop

loop:
  %iv:i32,IntBank = phi i32 [ %iv2, %loop ], [ %n, %entry ]
  %p1:i32,IntBank = phi i32 [ %p2, %loop ], [ %p, %entry ]
  %q1:i32,IntBank = phi i32 [ %q2, %loop ], [ %q, %entry ]
  %sum1:i64,VecBank = phi i64 [ %sum2, %loop ], [ %sum0, %entry ]
  %iv2:i32,IntBank = add i32 %iv, -1
  %p2:i32,IntBank = add i32 %p1, i32 8
  %q2:i32,IntBank = add i32 %q1, i32 8
  %lp:i64,VecBank = load %p1, align 4
  %lq:i64,VecBank = load %q1, align 4

  %lpl:i32,IntBank = extract_element i64 %lp, 0
  %lph:i32,IntBank = extract_element i64 %lp, 1
  %lql:i32,IntBank = extract_element i64 %lq, 0
  %lqh:i32,IntBank = extract_element i64 %lq, 1

  %prodl:i32,IntBank, %prodh0:i32,IntBank = umul_lohi i32 %lpl, %lql
  %prodh1:i32,IntBank = mul i32 %lpl, %lqh
  %prodh2:i32,IntBank = mul i32 %lph, %lql
  %prodh3:i32,IntBank = add i32 %prodh0, %prodh1
  %prodh:i32,IntBank = add i32 %prodh2, %prodh3

  %prod:i64,VecBank = build_pair %prodl, %prodh

  %sum2:i64,VecBank = add i64 %prod, %sum1
  %cmp2 = icmp eq i32 %iv, 0
  br i1 %cmp2, label %exit, label %entry

exit:
  %s:i64,VecBank = phi i64 [ %sum0, %entry ], [ %sum2, %loop ]
  store i64 %s, i32 %a, align 4
  ret void
}
All operations are now legal in their respective register banks, and an -O0
build would probably stop here. However, this code is quite expensive to execute
because of the many cross-bank copies, and the bank selector pass will clean
that up. First, the two 64-bit loads are moved to the integer register bank
where their values are needed. Second, the legal add i64 operation is split so
it can also be moved to the integer register bank, and the split is threaded
through the%sum1 phi to eliminate all cross-bank copies:

define void @dot(i32 %a, i32 %p, i32 %q, i32 %n) {
entry:
  %sum0l:i32,IntBank = const i32 0
  %sum0h:i32,IntBank = const i32 0
  %cmp1 = icmp eq i32 %n, 0
  br i1 %cmp1, label %exit, label %loop

loop:
  %iv:i32,IntBank = phi i32 [ %iv2, %loop ], [ %n, %entry ]
  %p1:i32,IntBank = phi i32 [ %p2, %loop ], [ %p, %entry ]
  %q1:i32,IntBank = phi i32 [ %q2, %loop ], [ %q, %entry ]
  %sum1l:i32,IntBank = phi i32 [ %sum2l, %loop ], [ %sum0l, %entry ]
  %sum1h:i32,IntBank = phi i32 [ %sum2h, %loop ], [ %sum0h, %entry ]
  %iv2:i32,IntBank = add i32 %iv, -1
  %p2:i32,IntBank = add i32 %p1, i32 8
  %q2:i32,IntBank = add i32 %q1, i32 8
  %lp:i64,IntBank = load %p1, align 4
  %lq:i64,IntBank = load %q1, align 4

  %lpl:i32,IntBank = extract_element i64 %lp, 0
  %lph:i32,IntBank = extract_element i64 %lp, 1
  %lql:i32,IntBank = extract_element i64 %lq, 0
  %lqh:i32,IntBank = extract_element i64 %lq, 1

  %prodl:i32,IntBank, %prodh0:i32,IntBank = umul_lohi i32 %lpl, %lql
  %prodh1:i32,IntBank = mul i32 %lpl, %lqh
  %prodh2:i32,IntBank = mul i32 %lph, %lql
  %prodh3:i32,IntBank = add i32 %prodh0, %prodh1
  %prodh:i32,IntBank = add i32 %prodh2, %prodh3

  %sum2l:i32,IntBank, carry = addc i32 %prodl, %sum1l
  %sum2h:i32,IntBank = adde %prodh, %sum1h

  %cmp2 = icmp eq i32 %iv, 0
  br i1 %cmp2, label %exit, label %entry

exit:
  %sl:i32,IntBank = phi i32 [ %sum0l, %entry ], [ %sum2l, %loop ]
  %sh:i32,IntBank = phi i32 [ %sum0h, %entry ], [ %sum2h, %loop ]
  %s:i64,IntBank = build_pair %sl, %sh
  store i64 %s, i32 %a, align 4
  ret void
}
The bank selection pass will need to weigh the cost of an extra adde instruction
against the cost of a cross-bank copy. Sometimes, like in this example, it is
preferable to split legal operations. On the other hand, if it weren't for
the multiplication, the entire function would run in the vector register bank. A
single instruction can cause entire expression webs to migrate.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/76eb7d06/attachment.html>

Hal Finkel

2013-Aug-08 23:59 UTC

head link

[LLVMdev] [global-isel] Simplifying the simplifier

----- Original Message -----
[snip]
> 
> DAGCombine is supposed to clean up legalization products, but today
> it seems to be accumulating some transformations that probably
> belong in InstCombine. Since it is running so late in the pass
> pipeline, its canonicalizing approach is causing problems for
> targets that are trying to arrange the IR optimally. Optimal and
> canonical are not always the same thing.
> 
> Can we get away with a gentler instruction simplifier that is more
> focused at cleaning up after legalization?
>From what I've observed, there tends to be a lot to cleanup after
legalization, especially because of the complexity of expanding GEP and all
kinds of vector operations. Can you provide some examples of simplifications in
DAGCombine that you think that we won't need with this new infrastructure?
What might be better is to put some abstract interface between instcombine and
the IR, so that instcombine can be run on these pseudo-MIs as well as on IR.

Thanks again,
Hal
>     *
> 
> Does it still work if we constrain the instruction simplifier to
> creating legal operations?
> 
[snip]> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

David Chisnall

2013-Aug-09 09:16 UTC

head link

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

On 9 Aug 2013, at 00:18, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> I am hoping that this proposal will generate a lot of feedback, and there
are many different topics to discuss. When replying to this email, please change
the subject header to something more specific, but keep the [global-isel] tag.
Subject changed, but I'm not sure if helps...

Overall, I really like this design (and would have had a much easier 18 months
if we'd had it two years ago).  Comments inline.
> Thanks,
> /jakob
> 
> Proposal for a Global Instruction Selector
> 
> It is becoming evident that we need a replacement for the
SelectionDAG-based instruction selector. This proposal describes the
architecture of a global instruction selector based on the MachineInstr
intermediate representation.
> 
> A new instruction selector is a large project that will probably take a
couple of years to complete. This proposal only describes the desired final
design, it does not describe the path of incremental development we should
follow to get there.
> 
> This is a strawman design which shouldn't be seen as a final proposal.
I am hoping for a lot of community feedback, and I don't expect we'll
have a final design before we actually begin implementing it.
> 
> Why not legalize LLVM IR?
> 
> It's been proposed to legalize LLVM IR before instruction selection,
see http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061567.html. I
recommend reading the mailing list thread debating the proposal, but let me
summarize the reasons for legalizing MI instead of LLVM IR:
> 
> 	• ABI boundaries have a too high-level representation in LLVM IR. By ABI
boundaries, I mean function call and return, landing pads, and inline assembly.
> 
> LLVM IR doesn't make it clear which function arguments go in registers
and which go on the stack. Stack frame and call frame maintenance instructions
are not represented in LLVM IR. Byval arguments are a big mystery, and some
arguments are actually passed in multiple registers, depending on their type.
> 
> While LLVM IR could be extended to represent all these things, it would
pollute the clean design of LLVM IR, and it is not the right level of
abstraction for this IR.
There is a lot of black magic required for ABI support.  I can read the SysV ABI
supplement for a given architecture and understand how, from my source language,
types should be represented and where function arguments should go.  There is
then almost no documentation on how this maps to the IR, so the only way when
writing a new front end for an existing back end is to see what clang generates.
The 'clean' design of the IR here is not very clean, as the IR contains
something that is neither what the back end nor the front end wants and has a
non-obvious mapping to either.
> 	• The getelementpointer instruction is too abstract. We want pointer
arithmetic to be exposed as normal integer arithmetic in a type system without
pointers. This would require a lot of awkward pointer-to-integer conversions in
LLVM IR.
For our back end, we absolutely do not want a type system without pointers - not
only do we have separate registers for pointers, we also have separate
instructions for manipulating them and representing this in SelectionDAG
requires some very ugly hacks.  This is also the case for several DSPs, which
have separate address and data registers, and for upcoming Intel chips with MPX
that have 4 hardware bounds registers that need keeping in sync with pointers. 
Something like GEP is actually a very good model for these, because it allows
the same bounds register to be used for a set of pointer operations, however it
would be fine to have a single pointer type that was independent of its pointee
type (or small set of different-sized ones, which will be useful for several
GPUs) and a lower-level GEP that was just a byte offset.

For the architectures that don't distinguish between pointers and integers,
there should be some option of lowering to integer types, however the
information that a particular register contains an
> Motivation
> 
> Everybody loves to hate SelectionDAG, but it is still useful to make its
shortcomings explicit. These are some of the goals for a new instruction
selector architecture:
I would add:

It is very difficult to extend SelectionDAG / TableGen with new value types that
are not very similar to existing ones.  For example, adding a new kind of vector
is easy, but adding a fat pointer representation is not really possible.
> Overall design
> 
> The global instruction selector is implemented as a series of machine
function passes. Targets can inject their own passes or replace standard passes,
using the existing mechanism for configuring code generation pipelines.
> 
> The MI intermediate representation is extended with a set of abstract
target-independent opcodes very similar to the ISD opcodes used by SelectionDAG.
Virtual registers can be typed so they have an EVT instead of a register class.
> 
> The standard passes are:
> 
> 	• MI builder.
> 	• Switch lowering.
> 	• Iterative legalization and cleanup.
> 	• Register bank selection.
> 	• Common subexpression elimination.
> 	• Instruction selection.
> I'll describe these passes in more detail below.
> 
> MI builder
> 
> The MI builder pass translates LLVM IR to MI IR, much like the current
SelectionDAGBuilder phase. Like SelectionDAGBuilder, this pass will:
> 
> 	• Expand LLVM IR first-class aggregate types into their constituent parts.
This also includes splitting load, store, phi, and select instructions.
> 	• Replace pointer types with target-specific integers.
> 	• Expand getelementptr instructions with integer adds and multiplies.
> 	• Map LLVM instructions to target-independent abstract MI opcodes.
> 	• Lower ABI boundaries with help from target hooks.
> Unlike SelectionDAGBuilder, this pass will:
> 
> 	• Create a 1-1 mapping of the LLVM IR CFG. No new basic blocks are
created.
> 	• Preserve switch instructions.
> 	• Preserve i1 logic instructions.
> 	• Not use MERGE_VALUES instructions. We'll use a value map that can
handle aggregates.
> The aggregate type expansion creates value types that can all be
represented by EVTs, and MachineRegisterInfo will be extended to allow virtual
registers to have an EVT instead of a register class. EVTs are all the integer,
floating point, and vector types from LLVM IR.
That all sounds very nice and clean.  I'd also like to represent pointers
(including their address space, but not their pointee type) in this level and
then have an optional pass (or a parameter for this one) that lowers pointer
operations to integers.
> The ABI boundary lowering requires types to be broken down further into
'legal types' that can be mapped to registers. The secondary breakdown
is currently handled by TargetLowering::LowerCallTo() calling getRegisterType()
and getNumRegisters(). Most ABIs are defined in terms of C types, not LLVM IR
types, so there is a close connection between the C frontend and the ABI
lowering code in the instruction selector. It would be a good idea to have the
ABI lowering code work independently of the type system used during instruction
selection. This is made possible by having ABI lowering be part of the LLVM IR
to MI translation process.
This seems cleaner.  I wonder if there is a way of making the C ABI more
explicit to front-end authors at this point.
> The global instruction selector will assign virtual registers to register
banks explicitly, not by using the type system. The set of register banks is
typically small (2-3) and defined by the target. Modelling register banks
explicitly makes it possible to move operations between register banks to
minimize cross-bank copies which are often expensive. SPARC even requires
cross-bank copies to go through memory, as does x86 in some cases.
And on some ARM chips (e.g. Cortex A8), the via-memory path is faster than the
register-register path in some cases.
> The interaction between legalization and bank selection is not clear yet.
For example, i64 would be a legal type on ARM with and/or/xor/add/sub operations
supported in a D-register. Some i64 operations are not supported in D-registers,
and it would be necessary to expand those i64 operations to i32 operations in
the GPR bank. When that happens, it is most likely preferable to
'legalize' more connected i64 operations so they can be moved to the GPR
bank as well.
How would you model the microMIPS/Thumb-1/RISC-V 16-bit (and, in some cases,
x86) idea that a denser encoding can be used if you restrict register accesses
to the bottom half of the register set?  The registers are still the same bank,
and the set of operations is the same, but in some cases much more expensive
(having to switch in and out of microMIPS mode, for example, is very expensive).
This problem hits multiple layers in the lowering process.  I don't have a
good solution for it (nor a strong requirement for it, but if there's a
design that will make it easier to solve in the future then that would be nice,
as we and the people at Berkeley are going to be doing a reasonable amount of
work with variable-length encodings over the coming years).
> Instruction selection
> 
> The instruction selection pass replaces most target-independent
instructions with real target opcodes, and it ensures that all virtual registers
have register classes instead of EVTs. Some target-independent instructions,
like COPY, are not lowered until after register allocation.
What is a target-independent instruction?  Does a register-register add count? 
It would be nice if target-specific machine instructions could have enough
metadata that some MI passes could become target independent.  For example,
I'm not sure how many people have implemented a delay slot filler so far (I
know of 4, but there could be more) and it would be great if we could have one
generic one and enough metadata added to the instruction definitions that it
could detect branches with nops in delay slots and determine which instructions
were safe to move there.
> 	• Full EVTs. Finally, arithmetic instructions actually depend on the full
type of their operands. It is worth noting, though, that the int/float
distinction is already redundantly encoded in the operations. LLVM IR has
separate add and fadd instructions, for example.
And a separate GEP for pointer arithmetic...

David

Steve Montgomery

2013-Aug-09 15:36 UTC

head link

[LLVMdev] [global-isel] Proposal for a global instruction selector

I thoroughly agree that a more configurable instruction selector is desirable
and that DAGCombine's canonicalisation results in less than optimal
instruction selection for some targets, such as the microcontrollers I'm
working on.

I'd be happy to contribute effort to this task if and when it comes to pass.

Steve

On 9 Aug 2013, at 00:18, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> I am hoping that this proposal will generate a lot of feedback, and there
are many different topics to discuss. When replying to this email, please change
the subject header to something more specific, but keep the [global-isel] tag.
> 
> Thanks,
> /jakob
[snip]

Dean Sutherland

2013-Aug-09 16:02 UTC

head link

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

This design does indeed appear generally cleaner than the current state of the
world.  What follows is a bit long, so here's the tl;dr version:
You *really* want to support address types in your type system, even if you
expand GEP-like operations into open arithmetic. This lets you produce good code
for a wide variety of odd-ball architectures with relative ease. Failing to
support address types makes it really difficult to support those odd-ball
architectures.

On Aug 9, 2013, at 5:16 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk>
 wrote:
>> 	• The getelementpointer instruction is too abstract. We want pointer
arithmetic to be exposed as normal integer arithmetic in a type system without
pointers. This would require a lot of awkward pointer-to-integer conversions in
LLVM IR.
> 
> For our back end, we absolutely do not want a type system without pointers
- not only do we have separate registers for pointers, we also have separate
instructions for manipulating them and representing this in SelectionDAG
requires some very ugly hacks.  This is also the case for several DSPs, which
have separate address and data registers, and for upcoming Intel chips with MPX
that have 4 hardware bounds registers that need keeping in sync with pointers. 
Something like GEP is actually a very good model for these, because it allows
the same bounds register to be used for a set of pointer operations, however it
would be fine to have a single pointer type that was independent of its pointee
type (or small set of different-sized ones, which will be useful for several
GPUs) and a lower-level GEP that was just a byte offset.
> 
> For the architectures that don't distinguish between pointers and
integers, there should be some option of lowering to integer types, however the
information that a particular register contains an

At Tartan Labs, we found it very useful to represent three different kinds of
information: the type of each value, the result context of each operation, and
whether operations required checks (or not).

The type system should absolutely know about addresses! At Tartan, the backend
(Optimizer & code-gen) concept of "type" included more kinds of
types than those described by the OP. Specifically, when compiling for various
DSPs and other unusual architectures, we found it was crucial to know not only
which values were "addresses", but also what specific kind of address
they were. Specific kinds of addresses were used to support machines with
multiple address spaces, fat pointers of various kinds (e.g., [pointer-to-array,
pointer-to-bounds-info], [pointer, capability], or even [pointer, capability,
bounds-info] tuples) and so on.  Consider, for example a machine with a split
Harvard architecture, where there are separate buses for the Code, Data-A and
Data-B address spaces. Given a "pointer" value, you must emit
different instructions to load from one of those spaces; the same bit-pattern in
the value references completely different memory.  On other architectures, you
also had to use different registers to access the different address spaces. And
you only get good performance by keeping all the code and data buses busy
simultaneously.  The machine with fat-pointers-with-capabilities required the
use of both special instructions and special registers for operations on those
pointer-with-capa values. Using ordinary (e.g., non-capa) instructions or
registers cleared the capability bits and so wiped out access to the protected
resources. Generating good code for architectures like these requires richer
knowledge of the types of the values the compiler is working with.

Result context may not matter for a 3-address IR such as LLVM uses.
The simplest result context system we used had three values: address, value, and
flow. This mattered because we had a tree-based IR, in which expression
temporaries were implicit. So we needed context information to guide allocation
of expression temporaries.  Given a 3-address IR, I suspect that at least
address and value context are redundant with the type information for the
destination.

That leaves only "flow" as an interesting concept.  It was useful to
know when the result of an operation would be used for control flow (rather than
being stored in some other variable). This helped us make better use of
condition codes (for machines that have them), or even of multiple condition
code registers. It also helped guide flow-targetted computations and temporaries
to register/instruction combinations that also set condition codes (when
relevant).  The usefulness of flow context across a very broad mix of
architectures may suggest that one or more "flow"-like types could be
useful.

Finally, checked operations.
If you ever expect to support one or more languages that do overflow checking on
arithmetic (or pointer deref, or array bounds, or address-sanitization, or ...),
you may find it useful to know whether particular operations (a) require checks,
(b) require lack-of-checks, or (c) don't care about checks -- all of these
from the viewpoint of language semantics. In addition, when checks are required,
it's really useful to distinguish the cases where the result of the check is
statically known (e.g., guaranteed to succeed or to fail) -- we discovered this
latter property in the optimizer, as part of removing unnecessary checks. The
obvious alternative here is to require front-ends that need checks to generate
the obvious IR to perform said check. However, it's rather more difficult to
generate efficient code this way. For C & C++, I believe that unsigned
requires lack-of-checks (because it's defined to wrap), but signed integers
don't care for most implementations (wrapping or not is undefined).

Dean Sutherland
(a Tartan Labs alumnus)

Jakob Stoklund Olesen

2013-Aug-09 20:55 UTC

head link

[LLVMdev] [global-isel] Simplifying the simplifier

On Aug 8, 2013, at 4:59 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
> [snip]
> 
>> 
>> DAGCombine is supposed to clean up legalization products, but today
>> it seems to be accumulating some transformations that probably
>> belong in InstCombine. Since it is running so late in the pass
>> pipeline, its canonicalizing approach is causing problems for
>> targets that are trying to arrange the IR optimally. Optimal and
>> canonical are not always the same thing.
>> 
>> Can we get away with a gentler instruction simplifier that is more
>> focused at cleaning up after legalization?
> 
> From what I've observed, there tends to be a lot to cleanup after
legalization, especially because of the complexity of expanding GEP and all
kinds of vector operations. Can you provide some examples of simplifications in
DAGCombine that you think that we won't need with this new infrastructure?
There is no doubt that we need a lot of cleanup after legalization.

The problem is with the approach taken by DAGCombine which is a canonicalizer
more than it is an optimizer. Specifically, it will reassociate expressions
without achieving anything by the transformation.

LSR is sometimes trying to make code look right for post-increment addressing
mode patterns, but DAGCombine messes things up before the instruction selector
gets to see them.

Thanks
/jakob

Jakob Stoklund Olesen

2013-Aug-09 21:07 UTC

head link

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

On Aug 9, 2013, at 2:16 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
> How would you model the microMIPS/Thumb-1/RISC-V 16-bit (and, in some
cases, x86) idea that a denser encoding can be used if you restrict register
accesses to the bottom half of the register set?  The registers are still the
same bank, and the set of operations is the same, but in some cases much more
expensive (having to switch in and out of microMIPS mode, for example, is very
expensive).  This problem hits multiple layers in the lowering process.  I
don't have a good solution for it (nor a strong requirement for it, but if
there's a design that will make it easier to solve in the future then that
would be nice, as we and the people at Berkeley are going to be doing a
reasonable amount of work with variable-length encodings over the coming years).
We already support Thumb-2 quite well with the CostPerUse field in register
descriptors. The register allocator tries to use the cheap half of the register
bank more often to allow the compact encoding to be used as much as possible.
This also applies to the x86-64 REX prefix.

Having to switch CPU modes mid-function sounds more like a research project. I
suppose you could model MIPS/microMIPS as two separate register banks, but you
would at least need a custom bank selector pass.

Thanks,
/jakob

Jakob Stoklund Olesen

2013-Aug-09 21:15 UTC

head link

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

On Aug 9, 2013, at 2:16 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
> For our back end, we absolutely do not want a type system without pointers
- not only do we have separate registers for pointers, we also have separate
instructions for manipulating them and representing this in SelectionDAG
requires some very ugly hacks. This is also the case for several DSPs, which
have separate address and data registers, and for upcoming Intel chips with MPX
that have 4 hardware bounds registers that need keeping in sync with pointers.
I am a bit skeptical about the need for pointer types; I don’t think it is the
right level of abstraction for an instruction selector. The processors I know
about with separate address and data registers (Blackfin, m86k, TriCore) would
be modeled perfectly by the register bank labels in the proposal.

Won’t that work for your case?

Thanks,
/jakob

Jakob Stoklund Olesen

2013-Aug-09 22:52 UTC

head link

[LLVMdev] [global-isel] Simplifying the simplifier

On Aug 8, 2013, at 4:59 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> What might be better is to put some abstract interface between instcombine
and the IR, so that instcombine can be run on these pseudo-MIs as well as on IR.
I like the idea of sharing code between IR and MI passes through an abstract
interface. I think that later stages in the IR pipeline also need an instruction
optimizer instead of a canonicalizer.

An alternative approach would be to describe these transformations in a DSL
instead of C++.

Thanks,
/jakob

Nuno Lopes

2013-Aug-10 14:32 UTC

head link

[LLVMdev] [global-isel] Simplifying the simplifier

Hi Jakob,

Thanks for creating this interesting proposal.
Let me just comment on this part:
>> What might be better is to put some abstract interface between 
>> instcombine and the IR, so that instcombine can be run on these 
>> pseudo-MIs as well as on IR.
>
> I like the idea of sharing code between IR and MI passes through an 
> abstract interface. I think that later stages in the IR pipeline also need 
> an instruction optimizer instead of a canonicalizer.
>
> An alternative approach would be to describe these transformations in a 
> DSL instead of C++.
>  *Legalization*
>  - What does 'more legal' mean? Can we restrict the possible
legalization
> transformations so the iterative process is guaranteed to make progress?

I've been thinking for a while that we could express the instcombine 
transformations in a DSL, and then generate the C++ code from there.  The 
advantage is that if we formalize the LLVM IR, then we can check the 
correctness of all the instcombine transformations for "free".
Moreover, instcombine has also suffered from infinite loops in the past 
(because canonicalization did not make progress for some inputs), which is 
also your concern with legalization of MI.  We have algorithms to prove 
termination of rewriting systems, so I believe we could also prove progress 
for both instcombine and MI legalization.

I'm mixing MI legalization and instcombine, since I think that the 
correctness and progress checking technology that we would need is the 
exactly the same.
I wouldn't mind working on this project.  But the time-frame on my side is 
academic, which may mean 1 or 2 years to be completed.

Nuno

Jakob Stoklund Olesen

2013-Aug-11 03:52 UTC

head link

[LLVMdev] [global-isel] Simplifying the simplifier

On Aug 10, 2013, at 7:32 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
>> I like the idea of sharing code between IR and MI passes through an
abstract interface. I think that later stages in the IR pipeline also need an
instruction optimizer instead of a canonicalizer.
>> 
>> An alternative approach would be to describe these transformations in a
DSL instead of C++.
> 
>> *Legalization*
>> - What does 'more legal' mean? Can we restrict the possible
legalization transformations so the iterative process is guaranteed to make
progress?
> 
> 
> I've been thinking for a while that we could express the instcombine
transformations in a DSL, and then generate the C++ code from there.  The
advantage is that if we formalize the LLVM IR, then we can check the correctness
of all the instcombine transformations for "free".
> Moreover, instcombine has also suffered from infinite loops in the past
(because canonicalization did not make progress for some inputs), which is also
your concern with legalization of MI.  We have algorithms to prove termination
of rewriting systems, so I believe we could also prove progress for both
instcombine and MI legalization.
> 
> I'm mixing MI legalization and instcombine, since I think that the
correctness and progress checking technology that we would need is the exactly
the same.
> I wouldn't mind working on this project.  But the time-frame on my side
is academic, which may mean 1 or 2 years to be completed.
This sounds promising. But we have some requirements that textbook rewriting
systems can't handle:

- Expressions are DAGs, not trees.
- Targets can add custom rewriting rules and override standard rules.
- Rules will have predicates. Some predicates are static and only depend on the
subtarget, some predicates will depend on the pattern being matched.
- The system won't be convergent if you ignore all the predicates.

So the question is, given those requirements, can we still prove termination?

Thanks,
/jakob

James Courtier-Dutton

2013-Aug-11 09:24 UTC

head link

[LLVMdev] [global-isel] Simplifying the simplifier

On 10 August 2013 15:32, Nuno Lopes <nunoplopes at sapo.pt>
wrote:> Hi Jakob,
>
> Thanks for creating this interesting proposal.
> Let me just comment on this part:
>
>
>>> What might be better is to put some abstract interface between
>>> instcombine and the IR, so that instcombine can be run on these
pseudo-MIs
>>> as well as on IR.
>>
>>
>> I like the idea of sharing code between IR and MI passes through an
>> abstract interface. I think that later stages in the IR pipeline also
need
>> an instruction optimizer instead of a canonicalizer.
>>
>> An alternative approach would be to describe these transformations in a
>> DSL instead of C++.
>
>
>>  *Legalization*
>>  - What does 'more legal' mean? Can we restrict the possible
legalization
>> transformations so the iterative process is guaranteed to make
progress?
>
>
>
> I've been thinking for a while that we could express the instcombine
> transformations in a DSL, and then generate the C++ code from there.  The
> advantage is that if we formalize the LLVM IR, then we can check the
> correctness of all the instcombine transformations for "free".
> Moreover, instcombine has also suffered from infinite loops in the past
> (because canonicalization did not make progress for some inputs), which is
> also your concern with legalization of MI.  We have algorithms to prove
> termination of rewriting systems, so I believe we could also prove progress
> for both instcombine and MI legalization.
>
> I'm mixing MI legalization and instcombine, since I think that the
> correctness and progress checking technology that we would need is the
> exactly the same.
> I wouldn't mind working on this project.  But the time-frame on my side
is
> academic, which may mean 1 or 2 years to be completed.
>
Just a comment:
If you believe you can represent all instcombine transformations in
DSL, you then transform them to C++.
Why not work out a method for checking correctness in the C++?
Then, you would not have to convert everything to DSL, and could run
the checks on the current C++.
The code in C++ should be able to be written in a static analysis
friendly way so that we could use static analysis to prove no infinite
loops.

James

Rafael Espíndola

2013-Aug-12 16:24 UTC

head link

[LLVMdev] [global-isel] Proposal for a global instruction selector

On 8 August 2013 19:18, Jakob Stoklund Olesen <stoklund at 2pi.dk>
wrote:> I am hoping that this proposal will generate a lot of feedback, and there
> are many different topics to discuss. When replying to this email, please
> change the subject header to something more specific, but keep the
> [global-isel] tag.
Sorry, looks like I cannot edit the subject line on gmail. The
question is fairly generic, so it is hopefully not too bad.

One of the nice properties of the LLVM IR is that it is complete. A FE
converts all the information it has to LLVM, and there is then no
going back to AST. This is what lets us serialize it and have language
independent tools to manipulate it.

The lower level IRs we currently use don't have that property. DAGs
and MI refer back to LLVM IR. This creates some redundancy and makes
serialization harder. It also creates problems when a pass wants to
modify information that is in the llvm IR. For example, we have a
memory leak in ARMFastISel.cpp:2194 :-(

So the question is if the plan is to make MI a "complete" IR, that we
can serialize independently of the LLVM IR, run a single pass on, etc.

Thanks,
Rafael

Bob Wilson

2013-Aug-12 16:33 UTC

head link

[LLVMdev] [global-isel] Proposal for a global instruction selector

On Aug 12, 2013, at 9:24 AM, Rafael Espíndola <rafael.espindola at
gmail.com> wrote:
> On 8 August 2013 19:18, Jakob Stoklund Olesen <stoklund at 2pi.dk>
wrote:
>> I am hoping that this proposal will generate a lot of feedback, and
there
>> are many different topics to discuss. When replying to this email,
please
>> change the subject header to something more specific, but keep the
>> [global-isel] tag.
> 
> Sorry, looks like I cannot edit the subject line on gmail. The
> question is fairly generic, so it is hopefully not too bad.
> 
> One of the nice properties of the LLVM IR is that it is complete. A FE
> converts all the information it has to LLVM, and there is then no
> going back to AST. This is what lets us serialize it and have language
> independent tools to manipulate it.
> 
> The lower level IRs we currently use don't have that property. DAGs
> and MI refer back to LLVM IR. This creates some redundancy and makes
> serialization harder. It also creates problems when a pass wants to
> modify information that is in the llvm IR. For example, we have a
> memory leak in ARMFastISel.cpp:2194 :-(
> 
> So the question is if the plan is to make MI a "complete" IR,
that we
> can serialize independently of the LLVM IR, run a single pass on, etc.
Yes, we should do that.

Jakob Stoklund Olesen

2013-Aug-12 16:44 UTC

head link

[LLVMdev] [global-isel] Proposal for a global instruction selector

On Aug 12, 2013, at 9:24 AM, Rafael Espíndola <rafael.espindola at
gmail.com> wrote:
> On 8 August 2013 19:18, Jakob Stoklund Olesen <stoklund at 2pi.dk>
wrote:
>> I am hoping that this proposal will generate a lot of feedback, and
there
>> are many different topics to discuss. When replying to this email,
please
>> change the subject header to something more specific, but keep the
>> [global-isel] tag.
> 
> Sorry, looks like I cannot edit the subject line on gmail. The
> question is fairly generic, so it is hopefully not too bad.
> 
> One of the nice properties of the LLVM IR is that it is complete. A FE
> converts all the information it has to LLVM, and there is then no
> going back to AST. This is what lets us serialize it and have language
> independent tools to manipulate it.
> 
> The lower level IRs we currently use don't have that property. DAGs
> and MI refer back to LLVM IR. This creates some redundancy and makes
> serialization harder. It also creates problems when a pass wants to
> modify information that is in the llvm IR. For example, we have a
> memory leak in ARMFastISel.cpp:2194 :-(
> 
> So the question is if the plan is to make MI a "complete" IR,
that we
> can serialize independently of the LLVM IR, run a single pass on, etc.
Absolutely, although that project is somewhat orthogonal to the design of the
instruction selector.

Some back-references are pointers into IR functions, such as MBB holding a
pointer to its BasicBlock and MachineMemOperand referencing a pointer Value.
These references should definitely be cleaned up.

Other dependencies are more about borrowing LLVM infrastructure. For example, an
EVT can reference an llvm::Type, and a MachineOperand can reference both a
ConstantInt and a ConstantFP. It doesn’t seem very important to reimplement that
functionality.

Thanks,
/jakob

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Aug 2013 - [LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Proposal for a global instruction selector

[LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

[LLVMdev] [global-isel] Proposal for a global instruction selector

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

[LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

[LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Simplifying the simplifier

[LLVMdev] [global-isel] Proposal for a global instruction selector

[LLVMdev] [global-isel] Proposal for a global instruction selector

[LLVMdev] [global-isel] Proposal for a global instruction selector

Possibly Parallel Threads