thr3ads.net - llvm dev - [llvm-dev] [GlobalISel] A Proposal for global instruction selection [Jan 2016]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2016-Jan-12 13:55 UTC

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi,
> I found this thinking quite difficult to explain. Does it make sense?It might help to link to the documentation on why bitcasts are weird on
big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts

Cheers,

James

On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
>
>
> I haven't found much time to look into the LLVM-IR-level optimizations
yet
> so I'm not sure how they handle bitcasts. With that disclaimer in mind,
I
> expect it's fine for the LLVM-IR level optimizations to handle them
using
> either definition since they are equivalent at the LLVM-IR level. My
> thinking is that LLVM-IR is consistent about how virtual bits are assigned
> to types and that non-zero instruction nops arise when there is
> inconsistency.
>
>
>
> At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto bits
0-127
> of <2 x i64> using the identity map. It's therefore ok to
interpret such
> bitcasts as zero-instruction no-ops. As far as I can tell, LLVM-IR has been
> defined such that the identity map can be used for bitcasts between all
> same-sized types, and also such that bitcasting between different-sized
> types is invalid.
>
>
>
> Similarly, most targets have a single mapping of virtual bit numbers to
> physical bit numbers for each size that is applied consistently when
> mapping a type to memory. For example 32-bits map like so:
>
> Little Endian Targets: virtual register bits {0..7,8..15,16..23,24..31}
> map to physical memory bits {0..7,8..15,16..23,24..31}
>
> Big Endian Targets: virtual register bits {0..7,8..15,16..23,24..31} map
> to physical memory bits {24..31,16..23,8..15,0..7}
>
> regardless of whether it's a float, or an i32. We therefore need zero
> instructions to re-map physical memory bits for one type onto another type.
>
>
>
> The same idea holds for physical register classes. There's a single
> consistent mapping from physical memory bits to physical register bits that
> applies for all types that can be stored in that class. As long as this is
> the case the load/store and zero-instruction interpretation of bitcasts are
> equivalent.
>
> In the case of big-endian MSA and NEON, there isn't a single consistent
> mapping from physical memory bits to physical register bits so the
> equivalence in the two definitions breaks down:
>
>                 i128: virtual register bits {0..31, 32..63, 64..95,
> 96...127} map to physical memory bits {96..127, 64..95, 32..63, 0..31}
>
>                 <4 x i32>: virtual register bits {0..31, 32..63,
64..95,
> 96...127} map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>
>                 <2 x i64>: virtual register bits {0..31, 32..63,
64..95,
> 96...127} map to physical memory bits {32..63, 0..31, 96..127, 64..95}
>
> with these inconsistent mappings we require instructions to bitcast
> between the types.
>
>
>
> I found this thinking quite difficult to explain. Does it make sense?
>
>
>
> > I am fine with treating bit casts as equivalent store/load pairs in
> GISel, I just want to be sure we do not have a semantic gap between the
> LLVM-IR and the backend if we do.
>
>
>
> I think a gap would arise from not having a GISel equivalent to
> ISD::BITCAST (gBITCAST?) available when it's necessary for correctness.
> However, I agree that GISel should delete bitcasts for the common case
> where the store/load and zero-instruction definitions are equivalent.
>
>
>
> *From:* Quentin Colombet [mailto:qcolombet at apple.com]
> *Sent:* 11 January 2016 17:23
> *To:* Daniel Sanders
> *Cc:* Tim Northover (t.p.northover at gmail.com); llvm-dev
>
>
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
>
>
> Hi Daniel,
>
>
>
> Thanks for the pointers, I wasn’t aware of the second thread you’ve
> mentioned.
>
>
>
> I may be wrong but I think LLVM-IR optimizations really treat bistcasts as
> no-op casts, in the sense of no instructions are required.
>
>
>
> Is there anyone that could chime in on that?
>
>
>
> However, it seems SelectionDAG sticks to the load/store semantic:
>
> "BITCAST - This operator converts between integer, vector and FP
values,
> as if the value was *stored to memory with one type and loaded from the
> same address with the other type* (or equivalently for vector format
> conversions, etc)."
>
>
>
> I am fine with treating bit casts as equivalent store/load pairs in GISel,
> I just want to be sure we do not have a semantic gap between the LLVM-IR
> and the backend if we do.
>
>
>
> Thanks,
>
> -Quentin
>
>
>
> On Jan 11, 2016, at 7:43 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com>
> wrote:
>
>
>
> Hi,
>
>
>
> It was a comment by Tim that first made me aware of it (see
> http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html but I
> think he commented on one of my patches before that).
>
>
>
> I asked about it on llvm-dev a couple weeks later (
> http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html)
> highlighting the contradiction and was told that 'no-op cast'
referred to
> the lack of math rather than a requirement that zero instructions are used.
> It's therefore my understanding that shuffling the bits to preserve the
> load/store based definition isn't considered to be changing the bits.
>
>
>
> I think the main thing the current definition is unclear on is whether it
> refers to the bits in a physical machine register or the bits in the
> LLVM-IR virtual register. Most of the time these two views are the same but
> this doesn't quite work for big-endian MSA/NEON. For example:
>
> %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to <2 x
i64>
>
> %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) |
4>
>
> are equivalent to each other in LLVM-IR terms but the constants are
> physically laid out in MSA registers as:
>
> 0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2, i32
3, i32 4>
>
> 0x00000003000000040000000100000002 # <2 x i64> <i64 (1 <<
32) | 2, i64 (3
> << 32) | 4>
>
> and we must therefore shuffle the bits to preserve LLVM-IR's point of
view.
>
>
>
> *From:* Quentin Colombet [mailto:qcolombet at apple.com <qcolombet at
apple.com>
> ]
> *Sent:* 07 January 2016 19:58
> *To:* Daniel Sanders
> *Cc:* llvm-dev
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
>
>
> Hi Daniel,
>
>
>
> I had a quick look at the language reference for bitcast and I have a
> different reading than what you were pointing out.
>
> Indeed, my take away is:
>
> "It is *always a **no-op cast* because no bits change with this
> conversion."
>
>
>
> In other words, deleting all bitcast instructions should be fine.
>
>
>
> My understanding of the quote you’ve highlighted is that it tells C
> programmers that this is like a memcpy, not a cast :).
>
>
>
> Cheers,
>
> -Quentin
>
> On Nov 20, 2015, at 6:53 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com>
> wrote:
>
>
>
> Hi,
>
>
>
> I haven't had chance to read all of this yet, but one minor thing
occurred
> to me during your presentation that I want to mention. At one point you
> mentioned deleting all the bitcast instructions since they're
equivalent to
> nops but this isn't always true.
>
>
>
> The http://llvm.org/docs/LangRef.html definition of the bitcast
> instruction includes this sentence:
>
> The conversion is done as if the value had been stored to memory and read
> back as type ty2.
>
> For big-endian MSA, this is equivalent to a shuffling of the bits in the
> register because endianness only changes the byte order within each
> element. The order of the elements is unaffected by endianness. IIRC,
> big-endian NEON is the same way.
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Quentin Colombet
via
> llvm-dev
> *Sent:* 18 November 2015 19:27
> *To:* llvm-dev
> *Subject:* [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
>
>
> Hi,
>
> With this email, I would like to kick-off the development for the next
> instruction selector that I described during the last LLVM Dev’ Meeting.
> For the motivations, see Jakob’s proposal (
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html) and
> for the proposal, see the slides (Keynote:
>
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co
or
> PDF:
>
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co)
> or the talk (
>
https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2
> ).
>
>
> TL;DR This is happening now, feedbacks invited!
>
> *** Context ***
>
> During the last LLVM Dev’ Meeting, I have presented a proposal for the
> next instruction selector, GlobalISel. The proposal is basically summarized
> in "High Level Prototype Design” and “Roadmap”. (If you want further
> details, feel free to reach me.)
>
> The first step of the development plan is to prototype the new framework
> on open source. The idea is to *start prototyping now(!)* and have the
> discussion ongoing in parallel. The reason of such approach is to have code
> that can be used to inform those discussions, e.g., by collecting data and
> trying different designs approaches. Regarding the discussion, I have
> listed a few points where your feedbacks would be particularly appreciated
> (see Feedback Invite).
>
>
> Also, as I have mentioned in my talk, some issues are controversial but I
> expect them to be resolved during prototype development. Specifically
> theses concern aspects of legalization (should parts of it be done at the
> LLVM IR level or all at the MI level?) and code re-use for instruction
> combiner. Please feel free to bring up your specific concern as I move
> along with the development plan.
>
> I expect the design to evolve with our experimental findings and your
> feedbacks and contributions.
> Nonetheless, we expect to nail down some design decisions once and for all
> as the prototype progresses. I have highlighted them with the following
> pattern *[final]*.
>
>
>
> *** Feedback Invite ***
>
> If you follow and support this work you need to be aware of three things
> and I am eager to hear your feedback and thoughts about them: the overall
> goals of Global ISel, the goals of the prototype, and the impact of the
> prototype work on backend design.
>
> In the section “Goals", I defined (repeated for people that saw the
talk)
> the goals for the Global ISel design.
> - Do you see anything missing?
> - Do you see something that should not be there?
>
> The prototype will answer critical design questions (see “Design Questions
> the Prototype Addresses at the End of M1" for examples) before the
actual
> design of Gobal ISel is finalized, but it cannot cover everything.
> Specifically we will **not** look into improving TableGen or reuse
> InstCombine (see “ Proposed Approach” for the rational). Please let me know
> if you see any issue with that.
>
> There is also basic ground work needed to prepare for Global ISel and I
> need to extend the core MachineInstr-level APIs as explained during the
> talk. For this, I prepared sketches of patches to illustrate them and
> describe the details in the “Implications” section below. Please have a
> look at the patches to have a better idea of the expected impact.
>
> If there is anything else you want to discuss related to Global ISel feel
> free to reach me. In particular, several people expressed their interests
> during the LLVM Dev Meeting in contributing to the project. Let me know
> what is your area of interest, so that we can coordinate our efforts.
> Anyhow, please add [GlobalISel] in the subject line to help categorizing
> the emails.
>
>
>
> *** Goals ***
>
> The high level goals of the new instruction selector are:
> - Global instruction selector.
> - Fast instruction selector.
> - Shared code path for fast and good instruction selection.
> - IR that represents ISA concepts better.
> - More flexible instruction selector.
> - Easier to maintain/understand framework, in particular legalization.
> - Self contained machine representation, no back links to LLVM IR.
> - No change to LLVM IR.
>
> Note:  The goals are common to all targets. In particular, we do not
> intend to work on target specific feature for the prototype.
> The bottom line is please make sure those goals are compatible with what
> you want to achieve for your target, even if your requirement does not get
> listed here.
>
>
>
> *** Proposed Approach ***
>
> In this section, I describe the approach I plan to pursue in the prototype
> and the roadmap to get there. The final design will flow out of it.
>
> For this prototype, we purposely exclude any work to improve or use
> TableGen or InstCombine *[final].* We will keep in mind however, that
> some of the C++ code we write will be table-generated at some point.
> The rational is that we do not want to lay down a new TableGen/InstCombine
> infrastructure before being able to work on the ISel framework itself.
>
> The prototype vehicle will be *AArch64*. None of the changes for
> GlobalISel will negatively impact the existing ISel.
>
>
> ** High Level Prototype Design **
>
> As shown in the talk, the expected pipeline for the prototype is:
> *LLVM IR *-> IRTranslator -> *Generic (G) MachineInstr* ->
Legalizer ->
> RegBankSelect -> Select -> *MachineInstr*
>
> Where:
> - Terms in *bold* are intermediate representations.
> -  Generic MachineInstrs are machine instructions with a generic opcode,
> e.g., ADD, COPY.
>
> - IRTranslator: Translate LLVM IR to (G) MachineInstr.
> - Legalizer: Legalize illegal (G) MachineInstr to legal (G) MachineInstr.
> - RegBankSelect: Assign virtual register with size to virtual register
> with Register Bank.
> - Select: Translate the remaining (G) MachineInstr to MachineIntr.
>
>
>
> ** Implications **
>
> As part of the bring-up of the prototype, we need to extend some of the
> core MachineInstr-level APIs:
>   - Need to remember FastMath flags for each MachineInstr.
>   - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16,
> etc.
>   - Extend the MachineRegisterInfo to support size as well as register
> classes for virtual registers.
>
> I have sketched the changes in the attached patches to help picturing how
> the changes would impact the existing APIs.
>
>
>
> Note: I do not intend to commit those changes as they are. They will go
> the usual review process in due time.
>
>
> The patches contain “// ***”-like comment that give a rough explanation on
> why those changes are needed w.r.t. the goals.
> The order of the patches could be modified since the dependencies between
> those are not sequential. Anyhow, here are the patches:
> 1. Introduce (some of) the generic opcode.
> 2. Make MachineFunction more independent of LLVM IR to eventually be able
> to delete the LLVM IR instance from the memory.
> 3. Extend MachineInstr to represent additional information attached to
> generic opcode.
> 4. Teach MachineRegisterInfo about size for virtual registers.
> 5. Introduce a helper class to build MachineInstr related objects.
> 6. Add new target hooks to lower the ABI directly to MachineInstr.
> 7. Introduce the IRTranslator pass.
>
>
> ** Roadmap for the Prototype **
>
> We plan to split the prototype in three main milestones:
> 1. Translation: LLVM IR to (G) MachineInstr translation.
> 2. Basic selector: Legal LLVM IR to target specific MachineInstr.
> 3. Simple legalization: Support scalar type legalization and some vector
> instructions.
>
> Notes:
> - For #1, we will not support any fancy instructions like landing pad or
> switch.
> - Each milestone should take about 3-4 months.
>
> - At the end of #2, we would have a FastISel like selector.
>
> Each milestone will be detailed right before starting it. The rational is
> that we want to accommodate what we discovered with the prototype for the
> next milestone. In other words, in this email, *I only describe the first
> milestone* in detail and I will give more details on the next milestone
> shortly before we start it and so on. For your information, here is the
> remaining of the intended roadmap for the *full* project:
> 4. Productization: Clean up implementation, stabilize the APIs.
> 5. Complex legalization: Extend legalization support to everything missing.
> 6. Completeness: Fill the blanks, e.g., landing pad.
> 7. Clean-up and performance: Add the necessary bits to be at parity or
> beat SelectionDAG generated code.
> 8. Transition: Document how to switch, provide tools to help.
>
>
> ** Milestone 1 **
>
> The first phase is focused on the IRTranslator pass.
>
> The IRTranslator is responsible for translating the LLVM IR into Generic
> MachineInstr. The IRTranslator pass uses some target hooks to perform the
> ABI lowering. We can either define a new API for them, e.g.,
> ABILoweringInfo, or extend the existing TargetLowering.
> Moreover, the prototype will focus on simple instruction, i.e., we will
> not support switch or landing pad for this iteration.
>
> At the end of M1, the prototype will not be able to produce code, since we
> would only have the beginning of the Global ISel pipeline. Instead, we will
> test the IRTranslator on the generic output that is produced from the
> tested IR.
>
> * Design Decisions *
>
> - The IRTranslator is a final class. Its purpose is to move away from LLVM
> IR to MachineInstr world *[final]*.
> - Lower the ABI as part of the translation process *[final]*.
>
> * Design Questions the Prototype Addresses at the End of M1 *
>
> - Handling of aggregate types during the translation.
> - Lowering of switches.
> - What about Module pass for Machine pass?
> - Introduce new APIs to have a clearer separation between:
>   - Legalization (setOperationAction, etc.)
>   - Cost/Combine related (isXXXFree, etc.)
>   - Lowering related (LowerFormal, etc.)
> - What is the contract with the backends? Is it still “should be able to
> select any valid LLVM IR”?
>
> Thanks,
>
> -Quentin
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/18b222f8/attachment-0001.html>

Daniel Sanders via llvm-dev

2016-Jan-12 14:37 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Thanks, I didn't know about that page. It's a much clearer explanation
of why the backend choses the code it does. However, there's a bit I'm
trying to explain that isn't covered on that page. I'm trying to explain
why the seemingly contradictory statements at
http://llvm.org/docs/LangRef.html#bitcast-to-instruction don't actually
contradict each other (even for big-endian NEON/MSA) while we're at the
LLVM-IR level and why it's safe for LLVM-IR-level optimizations to use the
zero-instruction definition despite the backend relying on the store/load
definition. It boils down to both definitions being equivalent until we
specialize to a target at which point the two definitions sometimes diverge.
They diverge when the mapping of virtual bits to physical bits differs between
LLVM-IR types.

From: James Molloy [mailto:james at jamesmolloy.co.uk]
Sent: 12 January 2016 13:56
To: Daniel Sanders; Quentin Colombet
Cc: llvm-dev
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi,
> I found this thinking quite difficult to explain. Does it make sense?It might help to link to the documentation on why bitcasts are weird on
big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts

Cheers,

James

On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi,

I haven't found much time to look into the LLVM-IR-level optimizations yet
so I'm not sure how they handle bitcasts. With that disclaimer in mind, I
expect it's fine for the LLVM-IR level optimizations to handle them using
either definition since they are equivalent at the LLVM-IR level. My thinking is
that LLVM-IR is consistent about how virtual bits are assigned to types and that
non-zero instruction nops arise when there is inconsistency.

At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto bits 0-127
of <2 x i64> using the identity map. It's therefore ok to interpret
such bitcasts as zero-instruction no-ops. As far as I can tell, LLVM-IR has been
defined such that the identity map can be used for bitcasts between all
same-sized types, and also such that bitcasting between different-sized types is
invalid.

Similarly, most targets have a single mapping of virtual bit numbers to physical
bit numbers for each size that is applied consistently when mapping a type to
memory. For example 32-bits map like so:
Little Endian Targets: virtual register bits {0..7,8..15,16..23,24..31} map to
physical memory bits {0..7,8..15,16..23,24..31}
Big Endian Targets: virtual register bits {0..7,8..15,16..23,24..31} map to
physical memory bits {24..31,16..23,8..15,0..7}
regardless of whether it's a float, or an i32. We therefore need zero
instructions to re-map physical memory bits for one type onto another type.

The same idea holds for physical register classes. There's a single
consistent mapping from physical memory bits to physical register bits that
applies for all types that can be stored in that class. As long as this is the
case the load/store and zero-instruction interpretation of bitcasts are
equivalent.
In the case of big-endian MSA and NEON, there isn't a single consistent
mapping from physical memory bits to physical register bits so the equivalence
in the two definitions breaks down:
                i128: virtual register bits {0..31, 32..63, 64..95, 96...127}
map to physical memory bits {96..127, 64..95, 32..63, 0..31}
                <4 x i32>: virtual register bits {0..31, 32..63, 64..95,
96...127} map to physical memory bits {0..31, 32..63, 64..95, 96..127}
                <2 x i64>: virtual register bits {0..31, 32..63, 64..95,
96...127} map to physical memory bits {32..63, 0..31, 96..127, 64..95}
with these inconsistent mappings we require instructions to bitcast between the
types.

I found this thinking quite difficult to explain. Does it make sense?
> I am fine with treating bit casts as equivalent store/load pairs in GISel,
I just want to be sure we do not have a semantic gap between the LLVM-IR and the
backend if we do.
I think a gap would arise from not having a GISel equivalent to ISD::BITCAST
(gBITCAST?) available when it's necessary for correctness. However, I agree
that GISel should delete bitcasts for the common case where the store/load and
zero-instruction definitions are equivalent.

From: Quentin Colombet [mailto:qcolombet at apple.com<mailto:qcolombet at
apple.com>]
Sent: 11 January 2016 17:23
To: Daniel Sanders
Cc: Tim Northover (t.p.northover at gmail.com<mailto:t.p.northover at
gmail.com>); llvm-dev

Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi Daniel,

Thanks for the pointers, I wasn’t aware of the second thread you’ve mentioned.

I may be wrong but I think LLVM-IR optimizations really treat bistcasts as no-op
casts, in the sense of no instructions are required.

Is there anyone that could chime in on that?

However, it seems SelectionDAG sticks to the load/store semantic:
"BITCAST - This operator converts between integer, vector and FP values, as
if the value was stored to memory with one type and loaded from the same address
with the other type (or equivalently for vector format conversions, etc)."

I am fine with treating bit casts as equivalent store/load pairs in GISel, I
just want to be sure we do not have a semantic gap between the LLVM-IR and the
backend if we do.

Thanks,
-Quentin

On Jan 11, 2016, at 7:43 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote:

Hi,

It was a comment by Tim that first made me aware of it (see
http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html but I think he
commented on one of my patches before that).

I asked about it on llvm-dev a couple weeks later
(http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html) highlighting
the contradiction and was told that 'no-op cast' referred to the lack of
math rather than a requirement that zero instructions are used. It's
therefore my understanding that shuffling the bits to preserve the load/store
based definition isn't considered to be changing the bits.

I think the main thing the current definition is unclear on is whether it refers
to the bits in a physical machine register or the bits in the LLVM-IR virtual
register. Most of the time these two views are the same but this doesn't
quite work for big-endian MSA/NEON. For example:
%0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to <2 x
i64>
%0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) | 4>
are equivalent to each other in LLVM-IR terms but the constants are physically
laid out in MSA registers as:
0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2, i32 3,
i32 4>
0x00000003000000040000000100000002 # <2 x i64> <i64 (1 << 32) |
2, i64 (3 << 32) | 4>
and we must therefore shuffle the bits to preserve LLVM-IR's point of view.

From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: 07 January 2016 19:58
To: Daniel Sanders
Cc: llvm-dev
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi Daniel,

I had a quick look at the language reference for bitcast and I have a different
reading than what you were pointing out.
Indeed, my take away is:
"It is always a no-op cast because no bits change with this
conversion."

In other words, deleting all bitcast instructions should be fine.

My understanding of the quote you’ve highlighted is that it tells C programmers
that this is like a memcpy, not a cast :).

Cheers,
-Quentin
On Nov 20, 2015, at 6:53 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>> wrote:

Hi,

I haven't had chance to read all of this yet, but one minor thing occurred
to me during your presentation that I want to mention. At one point you
mentioned deleting all the bitcast instructions since they're equivalent to
nops but this isn't always true.

The http://llvm.org/docs/LangRef.html definition of the bitcast instruction
includes this sentence:
The conversion is done as if the value had been stored to memory and read back
as type ty2.
For big-endian MSA, this is equivalent to a shuffling of the bits in the
register because endianness only changes the byte order within each element. The
order of the elements is unaffected by endianness. IIRC, big-endian NEON is the
same way.

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Quentin
Colombet via llvm-dev
Sent: 18 November 2015 19:27
To: llvm-dev
Subject: [llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi,

With this email, I would like to kick-off the development for the next
instruction selector that I described during the last LLVM Dev’ Meeting.
For the motivations, see Jakob’s proposal
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html) and for the
proposal, see the slides (Keynote:
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co
or PDF:
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co)
or the talk
(https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2).

TL;DR This is happening now, feedbacks invited!

*** Context ***

During the last LLVM Dev’ Meeting, I have presented a proposal for the next
instruction selector, GlobalISel. The proposal is basically summarized in
"High Level Prototype Design” and “Roadmap”. (If you want further details,
feel free to reach me.)

The first step of the development plan is to prototype the new framework on open
source. The idea is to start prototyping now(!) and have the discussion ongoing
in parallel. The reason of such approach is to have code that can be used to
inform those discussions, e.g., by collecting data and trying different designs
approaches. Regarding the discussion, I have listed a few points where your
feedbacks would be particularly appreciated (see Feedback Invite).

Also, as I have mentioned in my talk, some issues are controversial but I expect
them to be resolved during prototype development. Specifically theses concern
aspects of legalization (should parts of it be done at the LLVM IR level or all
at the MI level?) and code re-use for instruction combiner. Please feel free to
bring up your specific concern as I move along with the development plan.

I expect the design to evolve with our experimental findings and your feedbacks
and contributions.
Nonetheless, we expect to nail down some design decisions once and for all as
the prototype progresses. I have highlighted them with the following pattern
[final].

*** Feedback Invite ***

If you follow and support this work you need to be aware of three things and I
am eager to hear your feedback and thoughts about them: the overall goals of
Global ISel, the goals of the prototype, and the impact of the prototype work on
backend design.

In the section “Goals", I defined (repeated for people that saw the talk)
the goals for the Global ISel design.
- Do you see anything missing?
- Do you see something that should not be there?

The prototype will answer critical design questions (see “Design Questions the
Prototype Addresses at the End of M1" for examples) before the actual
design of Gobal ISel is finalized, but it cannot cover everything.
Specifically we will *not* look into improving TableGen or reuse InstCombine
(see “ Proposed Approach” for the rational). Please let me know if you see any
issue with that.

There is also basic ground work needed to prepare for Global ISel and I need to
extend the core MachineInstr-level APIs as explained during the talk. For this,
I prepared sketches of patches to illustrate them and describe the details in
the “Implications” section below. Please have a look at the patches to have a
better idea of the expected impact.

If there is anything else you want to discuss related to Global ISel feel free
to reach me. In particular, several people expressed their interests during the
LLVM Dev Meeting in contributing to the project. Let me know what is your area
of interest, so that we can coordinate our efforts.
Anyhow, please add [GlobalISel] in the subject line to help categorizing the
emails.

*** Goals ***

The high level goals of the new instruction selector are:
- Global instruction selector.
- Fast instruction selector.
- Shared code path for fast and good instruction selection.
- IR that represents ISA concepts better.
- More flexible instruction selector.
- Easier to maintain/understand framework, in particular legalization.
- Self contained machine representation, no back links to LLVM IR.
- No change to LLVM IR.

Note:  The goals are common to all targets. In particular, we do not intend to
work on target specific feature for the prototype.
The bottom line is please make sure those goals are compatible with what you
want to achieve for your target, even if your requirement does not get listed
here.

*** Proposed Approach ***

In this section, I describe the approach I plan to pursue in the prototype and
the roadmap to get there. The final design will flow out of it.

For this prototype, we purposely exclude any work to improve or use TableGen or
InstCombine [final]. We will keep in mind however, that some of the C++ code we
write will be table-generated at some point.
The rational is that we do not want to lay down a new TableGen/InstCombine
infrastructure before being able to work on the ISel framework itself.

The prototype vehicle will be AArch64. None of the changes for GlobalISel will
negatively impact the existing ISel.

** High Level Prototype Design **

As shown in the talk, the expected pipeline for the prototype is:
LLVM IR -> IRTranslator -> Generic (G) MachineInstr -> Legalizer ->
RegBankSelect -> Select -> MachineInstr

Where:
- Terms in bold are intermediate representations.
-  Generic MachineInstrs are machine instructions with a generic opcode, e.g.,
ADD, COPY.
- IRTranslator: Translate LLVM IR to (G) MachineInstr.
- Legalizer: Legalize illegal (G) MachineInstr to legal (G) MachineInstr.
- RegBankSelect: Assign virtual register with size to virtual register with
Register Bank.
- Select: Translate the remaining (G) MachineInstr to MachineIntr.

** Implications **

As part of the bring-up of the prototype, we need to extend some of the core
MachineInstr-level APIs:
  - Need to remember FastMath flags for each MachineInstr.
  - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16, etc.
  - Extend the MachineRegisterInfo to support size as well as register classes
for virtual registers.

I have sketched the changes in the attached patches to help picturing how the
changes would impact the existing APIs.

Note: I do not intend to commit those changes as they are. They will go the
usual review process in due time.

The patches contain “// ***”-like comment that give a rough explanation on why
those changes are needed w.r.t. the goals.
The order of the patches could be modified since the dependencies between those
are not sequential. Anyhow, here are the patches:
1. Introduce (some of) the generic opcode.
2. Make MachineFunction more independent of LLVM IR to eventually be able to
delete the LLVM IR instance from the memory.
3. Extend MachineInstr to represent additional information attached to generic
opcode.
4. Teach MachineRegisterInfo about size for virtual registers.
5. Introduce a helper class to build MachineInstr related objects.
6. Add new target hooks to lower the ABI directly to MachineInstr.
7. Introduce the IRTranslator pass.

** Roadmap for the Prototype **

We plan to split the prototype in three main milestones:
1. Translation: LLVM IR to (G) MachineInstr translation.
2. Basic selector: Legal LLVM IR to target specific MachineInstr.
3. Simple legalization: Support scalar type legalization and some vector
instructions.

Notes:
- For #1, we will not support any fancy instructions like landing pad or switch.
- Each milestone should take about 3-4 months.
- At the end of #2, we would have a FastISel like selector.

Each milestone will be detailed right before starting it. The rational is that
we want to accommodate what we discovered with the prototype for the next
milestone. In other words, in this email, I only describe the first milestone in
detail and I will give more details on the next milestone shortly before we
start it and so on. For your information, here is the remaining of the intended
roadmap for the full project:
4. Productization: Clean up implementation, stabilize the APIs.
5. Complex legalization: Extend legalization support to everything missing.
6. Completeness: Fill the blanks, e.g., landing pad.
7. Clean-up and performance: Add the necessary bits to be at parity or beat
SelectionDAG generated code.
8. Transition: Document how to switch, provide tools to help.

** Milestone 1 **

The first phase is focused on the IRTranslator pass.

The IRTranslator is responsible for translating the LLVM IR into Generic
MachineInstr. The IRTranslator pass uses some target hooks to perform the ABI
lowering. We can either define a new API for them, e.g., ABILoweringInfo, or
extend the existing TargetLowering.
Moreover, the prototype will focus on simple instruction, i.e., we will not
support switch or landing pad for this iteration.

At the end of M1, the prototype will not be able to produce code, since we would
only have the beginning of the Global ISel pipeline. Instead, we will test the
IRTranslator on the generic output that is produced from the tested IR.

* Design Decisions *

- The IRTranslator is a final class. Its purpose is to move away from LLVM IR to
MachineInstr world [final].
- Lower the ABI as part of the translation process [final].

* Design Questions the Prototype Addresses at the End of M1 *

- Handling of aggregate types during the translation.
- Lowering of switches.
- What about Module pass for Machine pass?
- Introduce new APIs to have a clearer separation between:
  - Legalization (setOperationAction, etc.)
  - Cost/Combine related (isXXXFree, etc.)
  - Lowering related (LowerFormal, etc.)
- What is the contract with the backends? Is it still “should be able to select
any valid LLVM IR”?

Thanks,
-Quentin

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/78beb23f/attachment.html>

Mehdi Amini via llvm-dev

2016-Jan-12 16:46 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

What happens when you cascade bitcast?
Are these sequences all equivalent at the IR level (i.e. do they reference the
same byte from the original i128)?

i128  =>  <16 x i8>  =>  GEP 0
i128  =>  <2 x i64>  =>  GEP 0  =>  <8 x i8>   =>  GEP 0
i128  =>  <2 x i64>  =>  GEP 0  =>  <2 x i32>  =>  GEP 0
=> <4 x i8>   =>  GEP 0


— 
Mehdi


> On Jan 12, 2016, at 6:37 AM, Daniel Sanders via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Thanks, I didn't know about that page. It's a much clearer
explanation of why the backend choses the code it does. However, there's a
bit I'm trying to explain that isn't covered on that page. I'm
trying to explain why the seemingly contradictory statements
athttp://llvm.org/docs/LangRef.html#bitcast-to-instruction
<http://llvm.org/docs/LangRef.html#bitcast-to-instruction> don't
actually contradict each other (even for big-endian NEON/MSA) while we're at
the LLVM-IR level and why it's safe for LLVM-IR-level optimizations to use
the zero-instruction definition despite the backend relying on the store/load
definition. It boils down to both definitions being equivalent until we
specialize to a target at which point the two definitions sometimes diverge.
They diverge when the mapping of virtual bits to physical bits differs between
LLVM-IR types.
>  
> From: James Molloy [mailto:james at jamesmolloy.co.uk] 
> Sent: 12 January 2016 13:56
> To: Daniel Sanders; Quentin Colombet
> Cc: llvm-dev
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>  
> Hi,
>  
> > I found this thinking quite difficult to explain. Does it make sense?
> It might help to link to the documentation on why bitcasts are weird on
big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts
<http://llvm.org/docs/BigEndianNEON.html#bitconverts>
>  
> Cheers,
>  
> James
>  
> On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi,
>  
> I haven't found much time to look into the LLVM-IR-level optimizations
yet so I'm not sure how they handle bitcasts. With that disclaimer in mind,
I expect it's fine for the LLVM-IR level optimizations to handle them using
either definition since they are equivalent at the LLVM-IR level. My thinking is
that LLVM-IR is consistent about how virtual bits are assigned to types and that
non-zero instruction nops arise when there is inconsistency.
>  
> At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto bits
0-127 of <2 x i64> using the identity map. It's therefore ok to
interpret such bitcasts as zero-instruction no-ops. As far as I can tell,
LLVM-IR has been defined such that the identity map can be used for bitcasts
between all same-sized types, and also such that bitcasting between
different-sized types is invalid.
>  
> Similarly, most targets have a single mapping of virtual bit numbers to
physical bit numbers for each size that is applied consistently when mapping a
type to memory. For example 32-bits map like so:
> Little Endian Targets: virtual register bits {0..7,8..15,16..23,24..31} map
to physical memory bits {0..7,8..15,16..23,24..31}
> Big Endian Targets: virtual register bits {0..7,8..15,16..23,24..31} map to
physical memory bits {24..31,16..23,8..15,0..7}
> regardless of whether it's a float, or an i32. We therefore need zero
instructions to re-map physical memory bits for one type onto another type.
>  
> The same idea holds for physical register classes. There's a single
consistent mapping from physical memory bits to physical register bits that
applies for all types that can be stored in that class. As long as this is the
case the load/store and zero-instruction interpretation of bitcasts are
equivalent.
> In the case of big-endian MSA and NEON, there isn't a single consistent
mapping from physical memory bits to physical register bits so the equivalence
in the two definitions breaks down:
>                 i128: virtual register bits {0..31, 32..63, 64..95,
96...127} map to physical memory bits {96..127, 64..95, 32..63, 0..31}
>                 <4 x i32>: virtual register bits {0..31, 32..63,
64..95, 96...127} map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>                 <2 x i64>: virtual register bits {0..31, 32..63,
64..95, 96...127} map to physical memory bits {32..63, 0..31, 96..127, 64..95}
> with these inconsistent mappings we require instructions to bitcast between
the types.
>  
> I found this thinking quite difficult to explain. Does it make sense?
>  
> > I am fine with treating bit casts as equivalent store/load pairs in
GISel, I just want to be sure we do not have a semantic gap between the LLVM-IR
and the backend if we do.
>  
> I think a gap would arise from not having a GISel equivalent to
ISD::BITCAST (gBITCAST?) available when it's necessary for correctness.
However, I agree that GISel should delete bitcasts for the common case where the
store/load and zero-instruction definitions are equivalent.
>  
> From: Quentin Colombet [mailto:qcolombet at apple.com <mailto:qcolombet
at apple.com>]
> Sent: 11 January 2016 17:23
> To: Daniel Sanders
> Cc: Tim Northover (t.p.northover at gmail.com <mailto:t.p.northover at
gmail.com>); llvm-dev
> 
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>  
> Hi Daniel,
>  
> Thanks for the pointers, I wasn’t aware of the second thread you’ve
mentioned.
>  
> I may be wrong but I think LLVM-IR optimizations really treat bistcasts as
no-op casts, in the sense of no instructions are required.
>  
> Is there anyone that could chime in on that?
>  
> However, it seems SelectionDAG sticks to the load/store semantic:
> "BITCAST - This operator converts between integer, vector and FP
values, as if the value was stored to memory with one type and loaded from the
same address with the other type (or equivalently for vector format conversions,
etc)."
>  
> I am fine with treating bit casts as equivalent store/load pairs in GISel,
I just want to be sure we do not have a semantic gap between the LLVM-IR and the
backend if we do.
>  
> Thanks,
> -Quentin
>  
> On Jan 11, 2016, at 7:43 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com <mailto:Daniel.Sanders at imgtec.com>> wrote:
>  
> Hi,
>  
> It was a comment by Tim that first made me aware of it (see
http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html
<http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html> but I
think he commented on one of my patches before that).
>  
> I asked about it on llvm-dev a couple weeks later
(http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html
<http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html>)
highlighting the contradiction and was told that 'no-op cast' referred
to the lack of math rather than a requirement that zero instructions are used.
It's therefore my understanding that shuffling the bits to preserve the
load/store based definition isn't considered to be changing the bits.
>  
> I think the main thing the current definition is unclear on is whether it
refers to the bits in a physical machine register or the bits in the LLVM-IR
virtual register. Most of the time these two views are the same but this
doesn't quite work for big-endian MSA/NEON. For example:
> %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to <2 x
i64>
> %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) |
4>
> are equivalent to each other in LLVM-IR terms but the constants are
physically laid out in MSA registers as:
> 0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2, i32
3, i32 4>
> 0x00000003000000040000000100000002 # <2 x i64> <i64 (1 <<
32) | 2, i64 (3 << 32) | 4>
> and we must therefore shuffle the bits to preserve LLVM-IR's point of
view.
>  
> From: Quentin Colombet [mailto:qcolombet at apple.com <mailto:qcolombet
at apple.com>]
> Sent: 07 January 2016 19:58
> To: Daniel Sanders
> Cc: llvm-dev
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>  
> Hi Daniel,
>  
> I had a quick look at the language reference for bitcast and I have a
different reading than what you were pointing out.
> Indeed, my take away is:
> "It is always a no-op cast because no bits change with this
conversion."
>  
> In other words, deleting all bitcast instructions should be fine.
>  
> My understanding of the quote you’ve highlighted is that it tells C
programmers that this is like a memcpy, not a cast :).
>  
> Cheers,
> -Quentin
> On Nov 20, 2015, at 6:53 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com <mailto:Daniel.Sanders at imgtec.com>> wrote:
>  
> Hi,
>  
> I haven't had chance to read all of this yet, but one minor thing
occurred to me during your presentation that I want to mention. At one point you
mentioned deleting all the bitcast instructions since they're equivalent to
nops but this isn't always true.
>  
> The http://llvm.org/docs/LangRef.html
<http://llvm.org/docs/LangRef.html> definition of the bitcast instruction
includes this sentence:
> The conversion is done as if the value had been stored to memory and read
back as type ty2.
> For big-endian MSA, this is equivalent to a shuffling of the bits in the
register because endianness only changes the byte order within each element. The
order of the elements is unaffected by endianness. IIRC, big-endian NEON is the
same way.
>  
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Quentin Colombet
via llvm-dev
> Sent: 18 November 2015 19:27
> To: llvm-dev
> Subject: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>  
> Hi,
> 
> With this email, I would like to kick-off the development for the next
instruction selector that I described during the last LLVM Dev’ Meeting.
> For the motivations, see Jakob’s proposal
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html
<http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html>) and
for the proposal, see the slides (Keynote:
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co
<http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co>
or PDF:
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co
<http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co>)
or the talk
(https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2
<https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2>).
> 
> TL;DR This is happening now, feedbacks invited!
> 
> *** Context ***
> 
> During the last LLVM Dev’ Meeting, I have presented a proposal for the next
instruction selector, GlobalISel. The proposal is basically summarized in
"High Level Prototype Design” and “Roadmap”. (If you want further details,
feel free to reach me.)
> 
> The first step of the development plan is to prototype the new framework on
open source. The idea is to start prototyping now(!) and have the discussion
ongoing in parallel. The reason of such approach is to have code that can be
used to inform those discussions, e.g., by collecting data and trying different
designs approaches. Regarding the discussion, I have listed a few points where
your feedbacks would be particularly appreciated (see Feedback Invite).
> 
> Also, as I have mentioned in my talk, some issues are controversial but I
expect them to be resolved during prototype development. Specifically theses
concern aspects of legalization (should parts of it be done at the LLVM IR level
or all at the MI level?) and code re-use for instruction combiner. Please feel
free to bring up your specific concern as I move along with the development
plan.
> 
> I expect the design to evolve with our experimental findings and your
feedbacks and contributions.
> Nonetheless, we expect to nail down some design decisions once and for all
as the prototype progresses. I have highlighted them with the following pattern
[final].
> 
> 
> 
> *** Feedback Invite ***
> 
> If you follow and support this work you need to be aware of three things
and I am eager to hear your feedback and thoughts about them: the overall goals
of Global ISel, the goals of the prototype, and the impact of the prototype work
on backend design.
> 
> In the section “Goals", I defined (repeated for people that saw the
talk) the goals for the Global ISel design.
> - Do you see anything missing?
> - Do you see something that should not be there? 
> 
> The prototype will answer critical design questions (see “Design Questions
the Prototype Addresses at the End of M1" for examples) before the actual
design of Gobal ISel is finalized, but it cannot cover everything.
> Specifically we will *not* look into improving TableGen or reuse
InstCombine (see “ Proposed Approach” for the rational). Please let me know if
you see any issue with that.
> 
> There is also basic ground work needed to prepare for Global ISel and I
need to extend the core MachineInstr-level APIs as explained during the talk.
For this, I prepared sketches of patches to illustrate them and describe the
details in the “Implications” section below. Please have a look at the patches
to have a better idea of the expected impact.
> 
> If there is anything else you want to discuss related to Global ISel feel
free to reach me. In particular, several people expressed their interests during
the LLVM Dev Meeting in contributing to the project. Let me know what is your
area of interest, so that we can coordinate our efforts.
> Anyhow, please add [GlobalISel] in the subject line to help categorizing
the emails.
> 
> 
> 
> *** Goals ***
> 
> The high level goals of the new instruction selector are:
> - Global instruction selector.
> - Fast instruction selector.
> - Shared code path for fast and good instruction selection.
> - IR that represents ISA concepts better.
> - More flexible instruction selector.
> - Easier to maintain/understand framework, in particular legalization.
> - Self contained machine representation, no back links to LLVM IR.
> - No change to LLVM IR.
> 
> Note:  The goals are common to all targets. In particular, we do not intend
to work on target specific feature for the prototype.
> The bottom line is please make sure those goals are compatible with what
you want to achieve for your target, even if your requirement does not get
listed here.
> 
> 
> 
> *** Proposed Approach ***
> 
> In this section, I describe the approach I plan to pursue in the prototype
and the roadmap to get there. The final design will flow out of it.
> 
> For this prototype, we purposely exclude any work to improve or use
TableGen or InstCombine [final]. We will keep in mind however, that some of the
C++ code we write will be table-generated at some point.
> The rational is that we do not want to lay down a new TableGen/InstCombine
infrastructure before being able to work on the ISel framework itself.
> 
> The prototype vehicle will be AArch64. None of the changes for GlobalISel
will negatively impact the existing ISel.
> 
> 
> ** High Level Prototype Design **
> 
> As shown in the talk, the expected pipeline for the prototype is:
> LLVM IR -> IRTranslator -> Generic (G) MachineInstr -> Legalizer
-> RegBankSelect -> Select -> MachineInstr
> 
> Where:
> - Terms in bold are intermediate representations.
> -  Generic MachineInstrs are machine instructions with a generic opcode,
e.g., ADD, COPY.
> - IRTranslator: Translate LLVM IR to (G) MachineInstr.
> - Legalizer: Legalize illegal (G) MachineInstr to legal (G) MachineInstr.
> - RegBankSelect: Assign virtual register with size to virtual register with
Register Bank.
> - Select: Translate the remaining (G) MachineInstr to MachineIntr.
> 
> 
> 
> ** Implications **
> 
> As part of the bring-up of the prototype, we need to extend some of the
core MachineInstr-level APIs:
>   - Need to remember FastMath flags for each MachineInstr.
>   - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16,
etc.
>   - Extend the MachineRegisterInfo to support size as well as register
classes for virtual registers.
> 
> I have sketched the changes in the attached patches to help picturing how
the changes would impact the existing APIs.
>  
> Note: I do not intend to commit those changes as they are. They will go the
usual review process in due time.
> 
> The patches contain “// ***”-like comment that give a rough explanation on
why those changes are needed w.r.t. the goals.
> The order of the patches could be modified since the dependencies between
those are not sequential. Anyhow, here are the patches:
> 1. Introduce (some of) the generic opcode.
> 2. Make MachineFunction more independent of LLVM IR to eventually be able
to delete the LLVM IR instance from the memory.
> 3. Extend MachineInstr to represent additional information attached to
generic opcode.
> 4. Teach MachineRegisterInfo about size for virtual registers.
> 5. Introduce a helper class to build MachineInstr related objects.
> 6. Add new target hooks to lower the ABI directly to MachineInstr.
> 7. Introduce the IRTranslator pass.
> 
> 
> ** Roadmap for the Prototype **
> 
> We plan to split the prototype in three main milestones:
> 1. Translation: LLVM IR to (G) MachineInstr translation.
> 2. Basic selector: Legal LLVM IR to target specific MachineInstr.
> 3. Simple legalization: Support scalar type legalization and some vector
instructions.
> 
> Notes:
> - For #1, we will not support any fancy instructions like landing pad or
switch.
> - Each milestone should take about 3-4 months.
> - At the end of #2, we would have a FastISel like selector.
> 
> Each milestone will be detailed right before starting it. The rational is
that we want to accommodate what we discovered with the prototype for the next
milestone. In other words, in this email, I only describe the first milestone in
detail and I will give more details on the next milestone shortly before we
start it and so on. For your information, here is the remaining of the intended
roadmap for the full project:
> 4. Productization: Clean up implementation, stabilize the APIs.
> 5. Complex legalization: Extend legalization support to everything missing.
> 6. Completeness: Fill the blanks, e.g., landing pad.
> 7. Clean-up and performance: Add the necessary bits to be at parity or beat
SelectionDAG generated code.
> 8. Transition: Document how to switch, provide tools to help.
> 
> 
> ** Milestone 1 **
> 
> The first phase is focused on the IRTranslator pass.
> 
> The IRTranslator is responsible for translating the LLVM IR into Generic
MachineInstr. The IRTranslator pass uses some target hooks to perform the ABI
lowering. We can either define a new API for them, e.g., ABILoweringInfo, or
extend the existing TargetLowering.
> Moreover, the prototype will focus on simple instruction, i.e., we will not
support switch or landing pad for this iteration.
> 
> At the end of M1, the prototype will not be able to produce code, since we
would only have the beginning of the Global ISel pipeline. Instead, we will test
the IRTranslator on the generic output that is produced from the tested IR.
> 
> * Design Decisions *
> 
> - The IRTranslator is a final class. Its purpose is to move away from LLVM
IR to MachineInstr world [final].
> - Lower the ABI as part of the translation process [final].
> 
> * Design Questions the Prototype Addresses at the End of M1 *
> 
> - Handling of aggregate types during the translation.
> - Lowering of switches.
> - What about Module pass for Machine pass?
> - Introduce new APIs to have a clearer separation between:
>   - Legalization (setOperationAction, etc.)
>   - Cost/Combine related (isXXXFree, etc.)
>   - Lowering related (LowerFormal, etc.)
> - What is the contract with the backends? Is it still “should be able to
select any valid LLVM IR”?
> 
> Thanks,
> -Quentin
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/418883b1/attachment.html>

Philip Reames via llvm-dev

2016-Jan-13 00:11 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

I think after reading your link I'm actually more confused.  This might 
just be a wording problem, but let me ask a couple of clarifying questions.

1) After compiling the code sequence below (from that page), does the in 
memory bit pattern differ?  The page seemed to contradict itself.

%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
      store <2 x i64> %1, <2 x i64>* %y

2) If so, does this mean that performing dead-store-elimination is 
illegal for ARM?

3) Are loads and stores ever allowed to fault based on the in memory 
representation?

4) What happens if we have a load of <2xi64> following the store above 
and we do DSE the store before forwarding it's value?

Philip


On 01/12/2016 05:55 AM, James Molloy via llvm-dev wrote:> Hi,
>
> > I found this thinking quite difficult to explain. Does it make sense?
>
> It might help to link to the documentation on why bitcasts are weird 
> on big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts
>
> Cheers,
>
> James
>
> On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi,
>
>     I haven't found much time to look into the LLVM-IR-level
>     optimizations yet so I'm not sure how they handle bitcasts. With
>     that disclaimer in mind, I expect it's fine for the LLVM-IR level
>     optimizations to handle them using either definition since they
>     are equivalent at the LLVM-IR level. My thinking is that LLVM-IR
>     is consistent about how virtual bits are assigned to types and
>     that non-zero instruction nops arise when there is inconsistency.
>
>     At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto
>     bits 0-127 of <2 x i64> using the identity map. It's
therefore ok
>     to interpret such bitcasts as zero-instruction no-ops. As far as I
>     can tell, LLVM-IR has been defined such that the identity map can
>     be used for bitcasts between all same-sized types, and also such
>     that bitcasting between different-sized types is invalid.
>
>     Similarly, most targets have a single mapping of virtual bit
>     numbers to physical bit numbers for each size that is applied
>     consistently when mapping a type to memory. For example 32-bits
>     map like so:
>
>     Little Endian Targets: virtual register bits
>     {0..7,8..15,16..23,24..31} map to physical memory bits
>     {0..7,8..15,16..23,24..31}
>
>     Big Endian Targets: virtual register bits
>     {0..7,8..15,16..23,24..31} map to physical memory bits
>     {24..31,16..23,8..15,0..7}
>
>     regardless of whether it's a float, or an i32. We therefore need
>     zero instructions to re-map physical memory bits for one type onto
>     another type.
>
>     The same idea holds for physical register classes. There's a
>     single consistent mapping from physical memory bits to physical
>     register bits that applies for all types that can be stored in
>     that class. As long as this is the case the load/store and
>     zero-instruction interpretation of bitcasts are equivalent.
>
>     In the case of big-endian MSA and NEON, there isn't a single
>     consistent mapping from physical memory bits to physical register
>     bits so the equivalence in the two definitions breaks down:
>
>     i128: virtual register bits {0..31, 32..63, 64..95, 96...127} map
>     to physical memory bits {96..127, 64..95, 32..63, 0..31}
>
>     <4 x i32>: virtual register bits {0..31, 32..63, 64..95,
96...127}
>     map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>
>     <2 x i64>: virtual register bits {0..31, 32..63, 64..95,
96...127}
>     map to physical memory bits {32..63, 0..31, 96..127, 64..95}
>
>     with these inconsistent mappings we require instructions to
>     bitcast between the types.
>
>     I found this thinking quite difficult to explain. Does it make sense?
>
>     > I am fine with treating bit casts as equivalent store/load pairs
>     in GISel, I just want to be sure we do not have a semantic gap
>     between the LLVM-IR and the backend if we do.
>
>     I think a gap would arise from not having a GISel equivalent to
>     ISD::BITCAST (gBITCAST?) available when it's necessary for
>     correctness. However, I agree that GISel should delete bitcasts
>     for the common case where the store/load and zero-instruction
>     definitions are equivalent.
>
>     *From:*Quentin Colombet [mailto:qcolombet at apple.com
>     <mailto:qcolombet at apple.com>]
>     *Sent:* 11 January 2016 17:23
>     *To:* Daniel Sanders
>     *Cc:* Tim Northover (t.p.northover at gmail.com
>     <mailto:t.p.northover at gmail.com>); llvm-dev
>
>
>     *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global
>     instruction selection
>
>     Hi Daniel,
>
>     Thanks for the pointers, I wasn’t aware of the second thread
>     you’ve mentioned.
>
>     I may be wrong but I think LLVM-IR optimizations really treat
>     bistcasts as no-op casts, in the sense of no instructions are
>     required.
>
>     Is there anyone that could chime in on that?
>
>     However, it seems SelectionDAG sticks to the load/store semantic:
>
>     "BITCAST - This operator converts between integer, vector and FP
>     values, as if the value was *stored to memory with one type and
>     loaded from the same address with the other type* (or equivalently
>     for vector format conversions, etc)."
>
>     I am fine with treating bit casts as equivalent store/load pairs
>     in GISel, I just want to be sure we do not have a semantic gap
>     between the LLVM-IR and the backend if we do.
>
>     Thanks,
>
>     -Quentin
>
>         On Jan 11, 2016, at 7:43 AM, Daniel Sanders
>         <Daniel.Sanders at imgtec.com <mailto:Daniel.Sanders at
imgtec.com>>
>         wrote:
>
>         Hi,
>
>         It was a comment by Tim that first made me aware of it
>        
(seehttp://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.htmlbut
>         I think he commented on one of my patches before that).
>
>         I asked about it on llvm-dev a couple weeks later
>         (http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html)
>         highlighting the contradiction and was told that 'no-op
cast'
>         referred to the lack of math rather than a requirement that
>         zero instructions are used. It's therefore my understanding
>         that shuffling the bits to preserve the load/store based
>         definition isn't considered to be changing the bits.
>
>         I think the main thing the current definition is unclear on is
>         whether it refers to the bits in a physical machine register
>         or the bits in the LLVM-IR virtual register. Most of the time
>         these two views are the same but this doesn't quite work for
>         big-endian MSA/NEON. For example:
>
>         %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to
<2 x i64>
>
>         %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 <<
32) | 4>
>
>         are equivalent to each other in LLVM-IR terms but the
>         constants are physically laid out in MSA registers as:
>
>         0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32
2,
>         i32 3, i32 4>
>
>         0x00000003000000040000000100000002 # <2 x i64> <i64 (1
<< 32)
>         | 2, i64 (3 << 32) | 4>
>
>         and we must therefore shuffle the bits to preserve LLVM-IR's
>         point of view.
>
>         *From:*Quentin Colombet [mailto:qcolombet at apple.com]
>         *Sent:*07 January 2016 19:58
>         *To:*Daniel Sanders
>         *Cc:*llvm-dev
>         *Subject:*Re: [llvm-dev] [GlobalISel] A Proposal for global
>         instruction selection
>
>         Hi Daniel,
>
>         I had a quick look at the language reference for bitcast and I
>         have a different reading than what you were pointing out.
>
>         Indeed, my take away is:
>
>         "It is*always a */*no-op cast*/ because no bits change with
>         this conversion."
>
>         In other words, deleting all bitcast instructions should be fine.
>
>         My understanding of the quote you’ve highlighted is that it
>         tells C programmers that this is like a memcpy, not a cast :).
>
>         Cheers,
>
>         -Quentin
>
>             On Nov 20, 2015, at 6:53 AM, Daniel Sanders
>             <Daniel.Sanders at imgtec.com
>             <mailto:Daniel.Sanders at imgtec.com>> wrote:
>
>             Hi,
>
>             I haven't had chance to read all of this yet, but one
>             minor thing occurred to me during your presentation that I
>             want to mention. At one point you mentioned deleting all
>             the bitcast instructions since they're equivalent to nops
>             but this isn't always true.
>
>             Thehttp://llvm.org/docs/LangRef.htmldefinition of the
>             bitcast instruction includes this sentence:
>
>             The conversion is done as if the value had been stored to
>             memory and read back as type ty2.
>
>             For big-endian MSA, this is equivalent to a shuffling of
>             the bits in the register because endianness only changes
>             the byte order within each element. The order of the
>             elements is unaffected by endianness. IIRC, big-endian
>             NEON is the same way.
>
>             *From:*llvm-dev
>             [mailto:llvm-dev-bounces at lists.llvm.org]*On Behalf
>             Of*Quentin Colombet via llvm-dev
>             *Sent:*18 November 2015 19:27
>             *To:*llvm-dev
>             *Subject:*[llvm-dev] [GlobalISel] A Proposal for global
>             instruction selection
>
>             Hi,
>
>             With this email, I would like to kick-off the development
>             for the next instruction selector that I described during
>             the last LLVM Dev’ Meeting.
>             For the motivations, see Jakob’s proposal
>            
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html)
>             and for the proposal, see the slides (Keynote:
>            
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co
or
>             PDF:
>            
http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co)
>             or the talk
>            
(https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2).
>
>
>             TL;DR This is happening now, feedbacks invited!
>
>             *** Context ***
>
>             During the last LLVM Dev’ Meeting, I have presented a
>             proposal for the next instruction selector, GlobalISel.
>             The proposal is basically summarized in "High Level
>             Prototype Design” and “Roadmap”. (If you want further
>             details, feel free to reach me.)
>
>             The first step of the development plan is to prototype the
>             new framework on open source. The idea is to *start
>             prototyping now(!)* and have the discussion ongoing in
>             parallel. The reason of such approach is to have code that
>             can be used to inform those discussions, e.g., by
>             collecting data and trying different designs approaches.
>             Regarding the discussion, I have listed a few points where
>             your feedbacks would be particularly appreciated (see
>             Feedback Invite).
>
>
>             Also, as I have mentioned in my talk, some issues are
>             controversial but I expect them to be resolved during
>             prototype development. Specifically theses concern aspects
>             of legalization (should parts of it be done at the LLVM IR
>             level or all at the MI level?) and code re-use for
>             instruction combiner. Please feel free to bring up your
>             specific concern as I move along with the development plan.
>
>             I expect the design to evolve with our experimental
>             findings and your feedbacks and contributions.
>             Nonetheless, we expect to nail down some design decisions
>             once and for all as the prototype progresses. I have
>             highlighted them with the following pattern *[final]*.
>
>
>
>             *** Feedback Invite ***
>
>             If you follow and support this work you need to be aware
>             of three things and I am eager to hear your feedback and
>             thoughts about them: the overall goals of Global ISel, the
>             goals of the prototype, and the impact of the prototype
>             work on backend design.
>
>             In the section “Goals", I defined (repeated for people
>             that saw the talk) the goals for the Global ISel design.
>             - Do you see anything missing?
>             - Do you see something that should not be there?
>
>             The prototype will answer critical design questions (see
>             “Design Questions the Prototype Addresses at the End of
>             M1" for examples) before the actual design of Gobal ISel
>             is finalized, but it cannot cover everything.
>             Specifically we will **not** look into improving TableGen
>             or reuse InstCombine (see “ Proposed Approach” for the
>             rational). Please let me know if you see any issue with that.
>
>             There is also basic ground work needed to prepare for
>             Global ISel and I need to extend the core
>             MachineInstr-level APIs as explained during the talk. For
>             this, I prepared sketches of patches to illustrate them
>             and describe the details in the “Implications” section
>             below. Please have a look at the patches to have a better
>             idea of the expected impact.
>
>             If there is anything else you want to discuss related to
>             Global ISel feel free to reach me. In particular, several
>             people expressed their interests during the LLVM Dev
>             Meeting in contributing to the project. Let me know what
>             is your area of interest, so that we can coordinate our
>             efforts.
>             Anyhow, please add [GlobalISel] in the subject line to
>             help categorizing the emails.
>
>
>
>             *** Goals ***
>
>             The high level goals of the new instruction selector are:
>             - Global instruction selector.
>             - Fast instruction selector.
>             - Shared code path for fast and good instruction selection.
>             - IR that represents ISA concepts better.
>             - More flexible instruction selector.
>             - Easier to maintain/understand framework, in particular
>             legalization.
>             - Self contained machine representation, no back links to
>             LLVM IR.
>             - No change to LLVM IR.
>
>             Note:  The goals are common to all targets. In particular,
>             we do not intend to work on target specific feature for
>             the prototype.
>             The bottom line is please make sure those goals are
>             compatible with what you want to achieve for your target,
>             even if your requirement does not get listed here.
>
>
>
>             *** Proposed Approach ***
>
>             In this section, I describe the approach I plan to pursue
>             in the prototype and the roadmap to get there. The final
>             design will flow out of it.
>
>             For this prototype, we purposely exclude any work to
>             improve or use TableGen or InstCombine *[final].* We will
>             keep in mind however, that some of the C++ code we write
>             will be table-generated at some point.
>             The rational is that we do not want to lay down a new
>             TableGen/InstCombine infrastructure before being able to
>             work on the ISel framework itself.
>
>             The prototype vehicle will be *AArch64*. None of the
>             changes for GlobalISel will negatively impact the existing
>             ISel.
>
>
>             ** High Level Prototype Design **
>
>             As shown in the talk, the expected pipeline for the
>             prototype is:
>             *LLVM IR *-> IRTranslator -> *Generic (G) MachineInstr*
->
>             Legalizer -> RegBankSelect -> Select -> *MachineInstr*
>
>             Where:
>             - Terms in *bold* are intermediate representations.
>             -  Generic MachineInstrs are machine instructions with a
>             generic opcode, e.g., ADD, COPY.
>
>             - IRTranslator: Translate LLVM IR to (G) MachineInstr.
>             - Legalizer: Legalize illegal (G) MachineInstr to legal
>             (G) MachineInstr.
>             - RegBankSelect: Assign virtual register with size to
>             virtual register with Register Bank.
>             - Select: Translate the remaining (G) MachineInstr to
>             MachineIntr.
>
>
>
>             ** Implications **
>
>             As part of the bring-up of the prototype, we need to
>             extend some of the core MachineInstr-level APIs:
>               - Need to remember FastMath flags for each MachineInstr.
>               - Need to know the type of each MachineInstr. We don’t
>             want ADD8, ADD16, etc.
>               - Extend the MachineRegisterInfo to support size as well
>             as register classes for virtual registers.
>
>             I have sketched the changes in the attached patches to
>             help picturing how the changes would impact the existing APIs.
>
>             Note: I do not intend to commit those changes as they are.
>             They will go the usual review process in due time.
>
>
>             The patches contain “// ***”-like comment that give a
>             rough explanation on why those changes are needed w.r.t.
>             the goals.
>             The order of the patches could be modified since the
>             dependencies between those are not sequential. Anyhow,
>             here are the patches:
>             1. Introduce (some of) the generic opcode.
>             2. Make MachineFunction more independent of LLVM IR to
>             eventually be able to delete the LLVM IR instance from the
>             memory.
>             3. Extend MachineInstr to represent additional information
>             attached to generic opcode.
>             4. Teach MachineRegisterInfo about size for virtual registers.
>             5. Introduce a helper class to build MachineInstr related
>             objects.
>             6. Add new target hooks to lower the ABI directly to
>             MachineInstr.
>             7. Introduce the IRTranslator pass.
>
>
>             ** Roadmap for the Prototype **
>
>             We plan to split the prototype in three main milestones:
>             1. Translation: LLVM IR to (G) MachineInstr translation.
>             2. Basic selector: Legal LLVM IR to target specific
>             MachineInstr.
>             3. Simple legalization: Support scalar type legalization
>             and some vector instructions.
>
>             Notes:
>             - For #1, we will not support any fancy instructions like
>             landing pad or switch.
>             - Each milestone should take about 3-4 months.
>
>             - At the end of #2, we would have a FastISel like selector.
>
>             Each milestone will be detailed right before starting it.
>             The rational is that we want to accommodate what we
>             discovered with the prototype for the next milestone. In
>             other words, in this email, *I only describe the first
>             milestone* in detail and I will give more details on the
>             next milestone shortly before we start it and so on. For
>             your information, here is the remaining of the intended
>             roadmap for the *full* project:
>             4. Productization: Clean up implementation, stabilize the
>             APIs.
>             5. Complex legalization: Extend legalization support to
>             everything missing.
>             6. Completeness: Fill the blanks, e.g., landing pad.
>             7. Clean-up and performance: Add the necessary bits to be
>             at parity or beat SelectionDAG generated code.
>             8. Transition: Document how to switch, provide tools to help.
>
>
>             ** Milestone 1 **
>
>             The first phase is focused on the IRTranslator pass.
>
>             The IRTranslator is responsible for translating the LLVM
>             IR into Generic MachineInstr. The IRTranslator pass uses
>             some target hooks to perform the ABI lowering. We can
>             either define a new API for them, e.g., ABILoweringInfo,
>             or extend the existing TargetLowering.
>             Moreover, the prototype will focus on simple instruction,
>             i.e., we will not support switch or landing pad for this
>             iteration.
>
>             At the end of M1, the prototype will not be able to
>             produce code, since we would only have the beginning of
>             the Global ISel pipeline. Instead, we will test the
>             IRTranslator on the generic output that is produced from
>             the tested IR.
>
>             * Design Decisions *
>
>             - The IRTranslator is a final class. Its purpose is to
>             move away from LLVM IR to MachineInstr world *[final]*.
>             - Lower the ABI as part of the translation process *[final]*.
>
>             * Design Questions the Prototype Addresses at the End of M1 *
>
>             - Handling of aggregate types during the translation.
>             - Lowering of switches.
>             - What about Module pass for Machine pass?
>             - Introduce new APIs to have a clearer separation between:
>               - Legalization (setOperationAction, etc.)
>               - Cost/Combine related (isXXXFree, etc.)
>               - Lowering related (LowerFormal, etc.)
>             - What is the contract with the backends? Is it still
>             “should be able to select any valid LLVM IR”?
>
>             Thanks,
>
>             -Quentin
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/6b3459c8/attachment-0001.html>

Quentin Colombet via llvm-dev

2016-Jan-13 00:42 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi James,

I am also confused!
> On Jan 12, 2016, at 4:11 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> I think after reading your link I'm actually more confused.  This might
just be a wording problem, but let me ask a couple of clarifying questions.
> 
> 1) After compiling the code sequence below (from that page), does the in
memory bit pattern differ?  The page seemed to contradict itself.
+1

Thanks,
Q.> %0 = load <4 x i32> %x
> %1 = bitcast <4 x i32> %0 to <2 x i64>
>      store <2 x i64> %1, <2 x i64>* %y
> 
> 2) If so, does this mean that performing dead-store-elimination is illegal
for ARM?
> 
> 3) Are loads and stores ever allowed to fault based on the in memory
representation?
> 
> 4) What happens if we have a load of <2xi64> following the store
above and we do DSE the store before forwarding it's value?
> 
> Philip
> 
> 
> On 01/12/2016 05:55 AM, James Molloy via llvm-dev wrote:
>> Hi,
>> 
>> > I found this thinking quite difficult to explain. Does it make
sense?
>> 
>> It might help to link to the documentation on why bitcasts are weird on
big-endian NEON: 
<http://llvm.org/docs/BigEndianNEON.html#bitconverts>http://llvm.org/docs/BigEndianNEON.html#bitconverts
<http://llvm.org/docs/BigEndianNEON.html#bitconverts>
>> 
>> Cheers,
>> 
>> James
>> 
>> On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> Hi,
>> 
>>  
>> I haven't found much time to look into the LLVM-IR-level
optimizations yet so I'm not sure how they handle bitcasts. With that
disclaimer in mind, I expect it's fine for the LLVM-IR level optimizations
to handle them using either definition since they are equivalent at the LLVM-IR
level. My thinking is that LLVM-IR is consistent about how virtual bits are
assigned to types and that non-zero instruction nops arise when there is
inconsistency.
>> 
>>  
>> At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto
bits 0-127 of <2 x i64> using the identity map. It's therefore ok to
interpret such bitcasts as zero-instruction no-ops. As far as I can tell,
LLVM-IR has been defined such that the identity map can be used for bitcasts
between all same-sized types, and also such that bitcasting between
different-sized types is invalid.
>> 
>>  
>> Similarly, most targets have a single mapping of virtual bit numbers to
physical bit numbers for each size that is applied consistently when mapping a
type to memory. For example 32-bits map like so:
>> 
>> Little Endian Targets: virtual register bits {0..7,8..15,16..23,24..31}
map to physical memory bits {0..7,8..15,16..23,24..31}
>> 
>> Big Endian Targets: virtual register bits {0..7,8..15,16..23,24..31}
map to physical memory bits {24..31,16..23,8..15,0..7}
>> 
>> regardless of whether it's a float, or an i32. We therefore need
zero instructions to re-map physical memory bits for one type onto another type.
>> 
>>  
>> The same idea holds for physical register classes. There's a single
consistent mapping from physical memory bits to physical register bits that
applies for all types that can be stored in that class. As long as this is the
case the load/store and zero-instruction interpretation of bitcasts are
equivalent.
>> 
>> In the case of big-endian MSA and NEON, there isn't a single
consistent mapping from physical memory bits to physical register bits so the
equivalence in the two definitions breaks down:
>> 
>>                 i128: virtual register bits {0..31, 32..63, 64..95,
96...127} map to physical memory bits {96..127, 64..95, 32..63, 0..31}
>> 
>>                 <4 x i32>: virtual register bits {0..31, 32..63,
64..95, 96...127} map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>> 
>>                 <2 x i64>: virtual register bits {0..31, 32..63,
64..95, 96...127} map to physical memory bits {32..63, 0..31, 96..127, 64..95}
>> 
>> with these inconsistent mappings we require instructions to bitcast
between the types.
>> 
>>  
>> I found this thinking quite difficult to explain. Does it make sense?
>> 
>>  
>> > I am fine with treating bit casts as equivalent store/load pairs
in GISel, I just want to be sure we do not have a semantic gap between the
LLVM-IR and the backend if we do.
>> 
>>  
>> I think a gap would arise from not having a GISel equivalent to
ISD::BITCAST (gBITCAST?) available when it's necessary for correctness.
However, I agree that GISel should delete bitcasts for the common case where the
store/load and zero-instruction definitions are equivalent.
>> 
>>  
>> From: Quentin Colombet [mailto: <mailto:qcolombet at
apple.com>qcolombet at apple.com <mailto:qcolombet at apple.com>]
>> Sent: 11 January 2016 17:23
>> To: Daniel Sanders
>> Cc: Tim Northover ( <mailto:t.p.northover at
gmail.com>t.p.northover at gmail.com <mailto:t.p.northover at
gmail.com>); llvm-dev
>> 
>> 
>> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>> 
>>  
>> Hi Daniel,
>> 
>>  
>> Thanks for the pointers, I wasn’t aware of the second thread you’ve
mentioned.
>> 
>>  
>> I may be wrong but I think LLVM-IR optimizations really treat bistcasts
as no-op casts, in the sense of no instructions are required.
>> 
>>  
>> Is there anyone that could chime in on that?
>> 
>>  
>> However, it seems SelectionDAG sticks to the load/store semantic:
>> 
>> "BITCAST - This operator converts between integer, vector and FP
values, as if the value was stored to memory with one type and loaded from the
same address with the other type (or equivalently for vector format conversions,
etc)."
>> 
>>  
>> I am fine with treating bit casts as equivalent store/load pairs in
GISel, I just want to be sure we do not have a semantic gap between the LLVM-IR
and the backend if we do.
>> 
>>  
>> Thanks,
>> 
>> -Quentin
>> 
>>  
>> On Jan 11, 2016, at 7:43 AM, Daniel Sanders <
<mailto:Daniel.Sanders at imgtec.com>Daniel.Sanders at imgtec.com
<mailto:Daniel.Sanders at imgtec.com>> wrote:
>> 
>>  
>> Hi,
>> 
>>  
>> It was a comment by Tim that first made me aware of it (see 
<http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html>http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html
<http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html> but I
think he commented on one of my patches before that).
>> 
>>  
>> I asked about it on llvm-dev a couple weeks later
(http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html
<http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html>)
highlighting the contradiction and was told that 'no-op cast' referred
to the lack of math rather than a requirement that zero instructions are used.
It's therefore my understanding that shuffling the bits to preserve the
load/store based definition isn't considered to be changing the bits.
>> 
>>  
>> I think the main thing the current definition is unclear on is whether
it refers to the bits in a physical machine register or the bits in the LLVM-IR
virtual register. Most of the time these two views are the same but this
doesn't quite work for big-endian MSA/NEON. For example:
>> 
>> %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to
<2 x i64>
>> 
>> %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) |
4>
>> 
>> are equivalent to each other in LLVM-IR terms but the constants are
physically laid out in MSA registers as:
>> 
>> 0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2,
i32 3, i32 4>
>> 
>> 0x00000003000000040000000100000002 # <2 x i64> <i64 (1
<< 32) | 2, i64 (3 << 32) | 4>
>> 
>> and we must therefore shuffle the bits to preserve LLVM-IR's point
of view.
>> 
>>  
>> From: Quentin Colombet [ <mailto:qcolombet at
apple.com>mailto:qcolombet at apple.com <mailto:qcolombet at
apple.com>]
>> Sent: 07 January 2016 19:58
>> To: Daniel Sanders
>> Cc: llvm-dev
>> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>> 
>>  
>> Hi Daniel,
>> 
>>  
>> I had a quick look at the language reference for bitcast and I have a
different reading than what you were pointing out.
>> 
>> Indeed, my take away is:
>> 
>> "It is always a no-op cast because no bits change with this
conversion."
>> 
>>  
>> In other words, deleting all bitcast instructions should be fine.
>> 
>>  
>> My understanding of the quote you’ve highlighted is that it tells C
programmers that this is like a memcpy, not a cast :).
>> 
>>  
>> Cheers,
>> 
>> -Quentin
>> 
>> On Nov 20, 2015, at 6:53 AM, Daniel Sanders <Daniel.Sanders at
imgtec.com <mailto:Daniel.Sanders at imgtec.com>> wrote:
>> 
>>  
>> Hi,
>> 
>>  
>> I haven't had chance to read all of this yet, but one minor thing
occurred to me during your presentation that I want to mention. At one point you
mentioned deleting all the bitcast instructions since they're equivalent to
nops but this isn't always true.
>> 
>>  
>> The 
<http://llvm.org/docs/LangRef.html>http://llvm.org/docs/LangRef.html
<http://llvm.org/docs/LangRef.html> definition of the bitcast instruction
includes this sentence:
>> 
>> The conversion is done as if the value had been stored to memory and
read back as type ty2.
>> 
>> For big-endian MSA, this is equivalent to a shuffling of the bits in
the register because endianness only changes the byte order within each element.
The order of the elements is unaffected by endianness. IIRC, big-endian NEON is
the same way.
>> 
>>  
>> From: llvm-dev [ <mailto:llvm-dev-bounces at
lists.llvm.org>mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Quentin Colombet
via llvm-dev
>> Sent: 18 November 2015 19:27
>> To: llvm-dev
>> Subject: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>> 
>>  
>> Hi,
>> 
>> With this email, I would like to kick-off the development for the next
instruction selector that I described during the last LLVM Dev’ Meeting.
>> For the motivations, see Jakob’s proposal (
<http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html>http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html
<http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html>) and
for the proposal, see the slides (Keynote: 
<http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co>http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co
<http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co>
or PDF: 
<http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co>http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co
<http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co>)
or the talk (
<https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2>https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2
<https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2>).
>> 
>> 
>> TL;DR This is happening now, feedbacks invited!
>> 
>> *** Context ***
>> 
>> During the last LLVM Dev’ Meeting, I have presented a proposal for the
next instruction selector, GlobalISel. The proposal is basically summarized in
"High Level Prototype Design” and “Roadmap”. (If you want further details,
feel free to reach me.)
>> 
>> The first step of the development plan is to prototype the new
framework on open source. The idea is to start prototyping now(!) and have the
discussion ongoing in parallel. The reason of such approach is to have code that
can be used to inform those discussions, e.g., by collecting data and trying
different designs approaches. Regarding the discussion, I have listed a few
points where your feedbacks would be particularly appreciated (see Feedback
Invite).
>> 
>> 
>> Also, as I have mentioned in my talk, some issues are controversial but
I expect them to be resolved during prototype development. Specifically theses
concern aspects of legalization (should parts of it be done at the LLVM IR level
or all at the MI level?) and code re-use for instruction combiner. Please feel
free to bring up your specific concern as I move along with the development
plan.
>> 
>> I expect the design to evolve with our experimental findings and your
feedbacks and contributions.
>> Nonetheless, we expect to nail down some design decisions once and for
all as the prototype progresses. I have highlighted them with the following
pattern [final].
>> 
>> 
>> 
>> *** Feedback Invite ***
>> 
>> If you follow and support this work you need to be aware of three
things and I am eager to hear your feedback and thoughts about them: the overall
goals of Global ISel, the goals of the prototype, and the impact of the
prototype work on backend design.
>> 
>> In the section “Goals", I defined (repeated for people that saw
the talk) the goals for the Global ISel design.
>> - Do you see anything missing?
>> - Do you see something that should not be there? 
>> 
>> The prototype will answer critical design questions (see “Design
Questions the Prototype Addresses at the End of M1" for examples) before
the actual design of Gobal ISel is finalized, but it cannot cover everything.
>> Specifically we will *not* look into improving TableGen or reuse
InstCombine (see “ Proposed Approach” for the rational). Please let me know if
you see any issue with that.
>> 
>> There is also basic ground work needed to prepare for Global ISel and I
need to extend the core MachineInstr-level APIs as explained during the talk.
For this, I prepared sketches of patches to illustrate them and describe the
details in the “Implications” section below. Please have a look at the patches
to have a better idea of the expected impact.
>> 
>> If there is anything else you want to discuss related to Global ISel
feel free to reach me. In particular, several people expressed their interests
during the LLVM Dev Meeting in contributing to the project. Let me know what is
your area of interest, so that we can coordinate our efforts.
>> Anyhow, please add [GlobalISel] in the subject line to help
categorizing the emails.
>> 
>> 
>> 
>> *** Goals ***
>> 
>> The high level goals of the new instruction selector are:
>> - Global instruction selector.
>> - Fast instruction selector.
>> - Shared code path for fast and good instruction selection.
>> - IR that represents ISA concepts better.
>> - More flexible instruction selector.
>> - Easier to maintain/understand framework, in particular legalization.
>> - Self contained machine representation, no back links to LLVM IR.
>> - No change to LLVM IR.
>> 
>> Note:  The goals are common to all targets. In particular, we do not
intend to work on target specific feature for the prototype.
>> The bottom line is please make sure those goals are compatible with
what you want to achieve for your                                              
target, even if your requirement does not get listed here.
>> 
>> 
>> 
>> *** Proposed Approach ***
>> 
>> In this section, I describe the approach I plan to pursue in the
prototype and the roadmap to get there. The final design will flow out of it.
>> 
>> For this prototype, we purposely exclude any work to improve or use
TableGen or InstCombine [final]. We will keep in mind however, that some of the
C++ code we write will be table-generated at some point.
>> The rational is that we do not want to lay down a new
TableGen/InstCombine infrastructure before being able to work on the ISel
framework itself.
>> 
>> The prototype vehicle will be AArch64. None of the changes for
GlobalISel will negatively impact the existing ISel.
>> 
>> 
>> ** High Level Prototype Design **
>> 
>> As shown in the talk, the expected pipeline for the prototype is:
>> LLVM IR -> IRTranslator -> Generic (G) MachineInstr ->
Legalizer -> RegBankSelect -> Select -> MachineInstr
>> 
>> Where:
>> - Terms in bold are intermediate representations.
>> -  Generic MachineInstrs are machine instructions with a generic
opcode, e.g., ADD, COPY.
>> 
>> - IRTranslator: Translate LLVM IR to (G) MachineInstr.
>> - Legalizer: Legalize illegal (G) MachineInstr to legal (G)
MachineInstr.
>> - RegBankSelect: Assign virtual register with size to virtual register
with Register Bank.
>> - Select: Translate the remaining (G) MachineInstr to MachineIntr.
>> 
>> 
>> 
>> ** Implications **
>> 
>> As part of the bring-up of the prototype, we need to extend some of the
core MachineInstr-level APIs:
>>   - Need to remember FastMath flags for each MachineInstr.
>>   - Need to know the type of each MachineInstr. We don’t want ADD8,
ADD16, etc.
>>   - Extend the MachineRegisterInfo to support size as well as register
classes for virtual registers.
>> 
>> I have sketched the changes in the attached patches to help picturing
how the changes would impact the existing APIs.
>> 
>>  
>> Note: I do not intend to commit those changes as they are. They will go
the usual review process in due time.
>> 
>> 
>> The patches contain “// ***”-like comment that give a rough explanation
on why those changes are needed w.r.t. the goals.
>> The order of the patches could be modified since the dependencies
between those are not sequential. Anyhow, here are the patches:
>> 1. Introduce (some of) the generic opcode.
>> 2. Make MachineFunction more independent of LLVM IR to eventually be
able to delete the LLVM IR instance from the memory.
>> 3. Extend MachineInstr to represent additional information attached to
generic opcode.
>> 4. Teach MachineRegisterInfo about size for virtual registers.
>> 5. Introduce a helper class to build MachineInstr related objects.
>> 6. Add new target hooks to lower the ABI directly to MachineInstr.
>> 7. Introduce the IRTranslator pass.
>> 
>> 
>> ** Roadmap for the Prototype **
>> 
>> We plan to split the prototype in three main milestones:
>> 1. Translation: LLVM IR to (G) MachineInstr translation.
>> 2. Basic selector: Legal LLVM IR to target specific MachineInstr.
>> 3. Simple legalization: Support scalar type legalization and some
vector instructions.
>> 
>> Notes:
>> - For #1, we will not support any fancy instructions like landing pad
or switch.
>> - Each milestone should take about 3-4 months.
>> 
>> - At the end of #2, we would have a FastISel like selector.
>> 
>> Each milestone will be detailed right before starting it. The rational
is that we want to accommodate what we discovered with the prototype for the
next milestone. In other words, in this email, I only describe the first
milestone in detail and I will give more details on the next milestone shortly
before we start it and so on. For your information, here is the remaining of the
intended roadmap for the full project:
>> 4. Productization: Clean up implementation, stabilize the APIs.
>> 5. Complex legalization: Extend legalization support to everything
missing.
>> 6. Completeness: Fill the blanks, e.g., landing pad.
>> 7. Clean-up and performance: Add the necessary bits to be at parity or
beat SelectionDAG generated code.
>> 8. Transition: Document how to switch, provide tools to help.
>> 
>> 
>> ** Milestone 1 **
>> 
>> The first phase is focused on the IRTranslator pass.
>> 
>> The IRTranslator is responsible for translating the LLVM IR into
Generic MachineInstr. The IRTranslator pass uses some target hooks to perform
the ABI lowering. We can either define a new API for them, e.g.,
ABILoweringInfo, or extend the existing TargetLowering.
>> Moreover, the prototype will focus on simple instruction, i.e., we will
not support switch or landing pad for this iteration.
>> 
>> At the end of M1, the prototype will not be able to produce code, since
we would only have the beginning of the Global ISel pipeline. Instead, we will
test the IRTranslator on the generic output that is produced from the tested IR.
>> 
>> * Design Decisions *
>> 
>> - The IRTranslator is a final class. Its purpose is to move away from
LLVM IR to MachineInstr world [final].
>> - Lower the ABI as part of the translation process [final].
>> 
>> * Design Questions the Prototype Addresses at the End of M1 *
>> 
>> - Handling of aggregate types during the translation.
>> - Lowering of switches.
>> - What about Module pass for Machine pass?
>> - Introduce new APIs to have a clearer separation between:
>>   - Legalization (setOperationAction, etc.)
>>   - Cost/Combine related (isXXXFree, etc.)
>>   - Lowering related (LowerFormal, etc.)
>> - What is the contract with the backends? Is it still “should be able
to select any valid LLVM IR”?
>> 
>> Thanks,
>> 
>> -Quentin
>> 
>>  
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/66ff80fd/attachment-0001.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jan 2016 - [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Seemingly Similar Threads