thr3ads.net - llvm dev - [llvm-dev] [GlobalISel] A Proposal for global instruction selection [Jan 2016]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2016-Jan-15 08:45 UTC

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi,
> "It is always a no-op cast because no bits change with this
conversion.The conversion is done as if the value had been stored to memory and read
back as type ty2."

I think a simple "as-if" in there should be sufficient;

"It is always a no-op cast because it acts as if no bits change with this
conversion. The conversion is done as if the value had been stored to
memory and read back as type ty2."

What do you think?

James

On Thu, 14 Jan 2016 at 22:35 Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
> > From: "Philip Reames" <listmail at philipreames.com>
> > To: "James Molloy" <james at jamesmolloy.co.uk>,
"Daniel Sanders" <
> Daniel.Sanders at imgtec.com>, "Hal Finkel"
> > <hfinkel at anl.gov>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Thursday, January 14, 2016 3:48:37 PM
> > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
> >
> > This explanation makes a lot more sense to me. I think it would make
> > sense to document this mental model, but I agree that this
> > interpretation does not seem to require changes to the IR semantics.
>
> The semantics, no. But we still may want to update the language reference.
> It says, "It is always a no-op cast because no bits change with this
> conversion. The conversion is done as if the value had been stored to
> memory and read back as type ty2." And, what we've learned, is
that this
> second sentence does not always imply the first (the bits might, in fact,
> change).
>
>  -Hal
>
> >
> > Just to check, this implies that DSE *is* legal right?
> >
> > Philip
> >
> >
> > On 01/14/2016 05:48 AM, James Molloy wrote:
> >
> >
> >
> > Hi,
> >
> >
> > I've given a bit of misinformation here and have caused some
> > confusion. After talking with Tim and Mehdi last night on IRC, I
> > need to correct what I said above to fall more in line with what
> > Daniel is saying. If any of the below contradicts what I've said
> > already, please accept my apologies. This version should be right.
> >
> >
> > The behaviour of the code generator for big-endian NEON and MIPS is
> > derived from the fact that we did not want to change IR semantics at
> > all. A fundamental property that we do not want to break is memory
> > round-tripping:
> >
> >
> > %1 = load <4 x i32>, %p32
> > %2 = bitcast <4 x i32> %1 to <2 x i64>
> > store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
> >
> >
> > The value of memory before and after the store MUST NOT change
> > (contrary to what I said in an earlier post, I know).
> >
> >
> > So in fact everything you can do in IR is valid. There are no changes
> > to IR semantics in the slightest. However, when it comes to
> > generating code from the IR, there are new rules:
> > 1) Loads and stores are selected to be special loads and stores that
> > do some transform from a canonical form in memory to a type-specific
> > form in register.
> > 2) Because bitcasts are load/store pairs in semantic, they must
> > behave as if a store then load was done. Specifically (bitcast TyA
> > to TyB) must transform TyA -> canonical form -> TyB, as a store
then
> > load would. Therefore bitcasts are not no-ops during code generation
> > (*but behave as if they are from an IR perspective!*).
> >
> >
> > The reason this works neatly in IR is due to the IR's type system.
In
> > order to change type, a cast must be inserted or a memory round
> > trip. There is no other way. However in SDAG, things break down a
> > bit. SDAG is more weakly typed, and bitconverts are often simply
> > removed. We need that not to happen. Bitconverts are not no-ops.
> >
> >
> > Daniel's explanation of physical register mapping was excellent so
> > I'm not going to repeat that.
> >
> >
> > I apologise for the confusion and misinformation. This is quite a
> > complex topic and takes a bit of mind bending for me to understand,
> > and it was a long time ago.
> >
> >
> > James
> >
> >
> > On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <
> > Daniel.Sanders at imgtec.com > wrote:
> >
> >
> >
> >
> >
> >
> > > Ok. Then we need to change the LangRef as suggested. Given this
is
> > > a rather important semantic change, I think you need to send a
top
> > > level RFC to the list.
> >
> >
> >
> >
> >
> > FWIW, I don't think this is a semantic change to LLVM-IR itself. I
> > think it's more clearing up the misconception that LLVM-IR
semantics
> > also apply to SelectionDAG's operations. That said, I do think
it's
> > important to mention this in LangRef since it's very easy to make
> > this mistake and very few targets need to worry about the
> > distinction.
> >
> >
> >
> > To explain why I don't think this is a semantic change to LLVM-IR,
> > let's consider this example from earlier:
> >
> >
> >
> > %0 = load <4 x i32> %x
> > %1 = bitcast <4 x i32> %0 to <2 x i64>
> >
> >
> >
> >
> > store <2 x i64> %1, <2 x i64>* %y
> >
> >
> >
> >
> >
> > In LLVM-IR terms, if the value of %0 is:
> >
> > %0 = 0x00112233_44556677_8899aabb_ccddeeff
> >
> > then the value of %1 is:
> >
> > %1 = 0x0011223344556677_8899aabbccddeeff
> >
> > which agrees with the store/load and the 'no bits change'
statements
> > in LangRef.
> >
> >
> >
> > However, the mapping of these bits to physical register bits is not
> > consistent between types:
> >
> > Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
> >
> > Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
> >
> >
> >
> > Essentially, I'm saying that BitCastInst and ISD::BITCAST have
> > slightly different semantics because of their different domains. The
> > former is working on an abstract representation of the values where
> > both statements in LangRef are true, but the latter is closer to the
> > target where the 'no bits change' statement ceases to be true
in
> > some cases.
> >
> >
> >
> > > A couple of points that will need clarified:
> > > - Does this only apply to vector types? It definitely doesn't
apply
> > > between pointer types today. What about integer, floating point,
> > > and FCAs?
> >
> >
> >
> >
> >
> > I've only seen it for vector types so far but in theory it could
> > happen for other types. I'd expect FCAs to encounter it since the
> > physical registers may contain padding that isn't present in the
> > LLVM-IR representation and the placement and amount of padding will
> > depend on the exact FCA.
> >
> > I can think of cases where address space casts can encounter the same
> > problem but that's already been covered in LangRef ("It can
be a
> > no-op cast or a complex value modification, depending on the target
> > and the address space pair.").
> >
> >
> >
> > Does anyone use FCAs directly? Most targets seem to convert them to
> > same-sized integers or bitcast an FCA* to i8*.
> >
> >
> >
> >
> > > - Is combining two casts into one a legal operation? I think it
is
> > > so far, but we need to explicitly state that.
> >
> >
> >
> >
> >
> > Yes, A->B->C and A->C are equivalent.
> >
> >
> >
> >
> > > - Do we have a predicate for identifying no-op casts that can be
> > > freely removed/combined?
> >
> >
> >
> >
> >
> > James mentioned one in CGP but I haven't been able to find it. I
> > don't think it's necessary to have one at the LLVM-IR level
but we
> > do need one in the backends. I remember adding one to the backend
> > but I can't find that either so I think I'm remembering one of
my
> > patches from before I split MSA's registers into type-specific
> > classes.
> >
> >
> >
> >
> > > - Is coercing a load to the type it's immediately bitcast to
legal
> > > under this model?
> >
> >
> >
> >
> >
> > Yes.
> >
> >
> >
> >
> >
> >
> > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> > Of Philip Reames via llvm-dev
> > Sent: 13 January 2016 20:31
> > To: James Molloy; Hal Finkel
> > Cc: llvm-dev
> >
> >
> >
> >
> >
> >
> >
> > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > instruction selection
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 01/13/2016 12:20 PM, James Molloy wrote:
> >
> >
> >
> >
> > > (Right?)
> >
> >
> >
> >
> >
> > Uh no, the register content explicitly does change :( We insert REV
> > instructions (byteswap) on each bitcast. Bitcasts can be merged and
> > elided etc, but conceptually there's a register content change on
> > every bitcast.
> >
> > Ok. Then we need to change the LangRef as suggested. Given this is a
> > rather important semantic change, I think you need to send a top
> > level RFC to the list.
> >
> > A couple of points that will need clarified:
> > - Does this only apply to vector types? It definitely doesn't
apply
> > between pointer types today. What about integer, floating point, and
> > FCAs?
> > - Is combining two casts into one a legal operation? I think it is so
> > far, but we need to explicitly state that.
> > - Do we have a predicate for identifying no-op casts that can be
> > freely removed/combined?
> > - Is coercing a load to the type it's immediately bitcast to legal
> > under this model?
> >
> >
> >
> >
> >
> >
> >
> >
> > James
> >
> >
> >
> >
> >
> > On Wed, 13 Jan 2016 at 18:09 Philip Reames <
> > listmail at philipreames.com > wrote:
> >
> >
> >
> >
> >
> > On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> > > ----- Original Message -----
> > >> From: "James Molloy" < james at
jamesmolloy.co.uk >
> > >> To: "Hal Finkel" < hfinkel at anl.gov >
> > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org
>, "Quentin Colombet" <
> > >> qcolombet at apple.com >
> > >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > >> instruction selection
> > >>
> > >>
> > >>> I think that teaching the optimizer about big-Endian lane
> > >>> ordering
> > >>> would have been better.
> > >>
> > >> It's certainly arguable. Even in hindsight I'm glad
we didn't -
> > >> that's the approach GCC took and they've been fixing
subtle bugs
> > >> in
> > >> their vectorizer ever since.
> > >>
> > >>
> > >>> Inserting the REV after every LDR
> > >>
> > >> We only do this conceptually. In most cases REVs cancel out,
and
> > >> we
> > >> have the LD1 instruction which is LDR+REV. With enough
peepholes
> > >> there's really no need for code to run slower.
> > >>
> > >>
> > >>> Given what's been done, should we update the LangRef.
> > >>
> > >> Potentially, yes. I hadn't realised quite how strongly
worded it
> > >> was
> > >> with respect to this.
> > >>
> > > Please do ;)
> > I'm not sure changing bitcast is the right place. Since the
bitcast
> > is
> > representing the in-register value (which doesn't change), maybe
we
> > should define it as part of the load/store instead? That's
> > essentially
> > what's going on; we're converting from a canonical register
form to a
> > variety of memory forms. (Right?)
> > >
> > > -Hal
> > >
> > >> James
> > >>
> > >>
> > >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at
anl.gov > wrote:
> > >>
> > >>
> > >>
> > >>
> > >> [resending so the message is smaller]
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> From: "James Molloy via llvm-dev" < llvm-dev at
lists.llvm.org >
> > >> To: "Quentin Colombet" < qcolombet at apple.com
>
> > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> > >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > >> instruction selection
> > >>
> > >> Hi Philip,
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> store <2 x i64> %1, <2 x i64>* %y
> > >>
> > >> Yes. The memory pattern differs. This is the first diagram on
the
> > >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts
)
> > >>
> > >>
> > >> I think that teaching the optimizer about big-Endian lane
ordering
> > >> would have been better. Inserting the REV after every LDR
sounds
> > >> very similar to what we do for VSX on little-Endian PowerPC
> > >> systems
> > >> (PowerPC may have a slight advantage here in that we
don't need to
> > >> do insertelement / extractelement / shufflevector through
memory
> > >> on
> > >> systems where little-Endian mode is relevant, see
> > >>
>
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> > >> ).
> > >>
> > >> Given what's been done, should we update the LangRef. It
currently
> > >> reads, " The ‘ bitcast ‘ instruction converts value to
type ty2 .
> > >> It
> > >> is always a no-op cast because no bits change with this
> > >> conversion.
> > >> The conversion is done as if the value had been stored to
memory
> > >> and
> > >> read back as type ty2 ." But this is now, at the least,
> > >> misleading,
> > >> because this process of storing the value as one type and
reading
> > >> it
> > >> back in as another does, in fact, change the bits. We need to
make
> > >> clear that this might change the bits (perhaps specifically
by
> > >> calling out this case of vector bitcasts on big-Endian
systems?).
> > >>
> > >>
> > >>
> > >> Also, regarding this, " Most operating systems however
do not run
> > >> with alignment faults enabled, so this is often not an
issue." Are
> > >> you saying that the processor does the correct thing in this
case
> > >> (if alignment faults are not enabled, then it performs a
proper
> > >> unaligned load), or that the operating-system trap handler
> > >> emulates
> > >> the unaligned load should one occur?
> > >>
> > >> Thanks again,
> > >> Hal
> > >>
> > >>
> > >> _______________________________________________
> > >>
> > >>
> > >> LLVM Developers mailing list
> > >> llvm-dev at lists.llvm.org
> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>
> > >>
> > >> --
> > >> Hal Finkel
> > >> Assistant Computational Scientist
> > >> Leadership Computing Facility
> > >> Argonne National Laboratory
> > >>
> >
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/335de1ff/attachment.html>

Daniel Sanders via llvm-dev

2016-Jan-15 10:29 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi,

I think we just need to draw attention to the fact that other IR's may vary.

I'm thinking we should add something like this to the ISD::BITCAST doxygen
comment:
This is subtly different from the bitcast instruction from LLVM-IR since this
node may change the bits
in the register. For example, this occurs on big-endian NEON and big-endian MSA
where the layout
of the bits in the register depends on the vector type and this node acts as a
shuffle operation for
some vector type combinations.

And have LangRef say something like:
The conversion is done as if the value had been stored to memory and read back
as type ty2. This is equivalent to a no-op cast where no bits change with this
conversion.
                .. caution::
    This equivalence does not necessarily apply to other IR's in LLVM. See
ISD::BITCAST for an example.
The '.. caution::' should render in the same way as the
'Rationale' box in
http://llvm.org/docs/LangRef.html#volatile-memory-accesses.

From: James Molloy [mailto:james at jamesmolloy.co.uk]
Sent: 15 January 2016 08:46
To: Hal Finkel; Philip Reames
Cc: llvm-dev at lists.llvm.org; Daniel Sanders
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi,
> "It is always a no-op cast because no bits change with this
conversion. The conversion is done as if the value had been stored to memory and
read back as type ty2."
I think a simple "as-if" in there should be sufficient;

"It is always a no-op cast because it acts as if no bits change with this
conversion. The conversion is done as if the value had been stored to memory and
read back as type ty2."

What do you think?

James

On Thu, 14 Jan 2016 at 22:35 Hal Finkel <hfinkel at anl.gov<mailto:hfinkel
at anl.gov>> wrote:
----- Original Message -----> From: "Philip Reames" <listmail at
philipreames.com<mailto:listmail at philipreames.com>>
> To: "James Molloy" <james at jamesmolloy.co.uk<mailto:james
at jamesmolloy.co.uk>>, "Daniel Sanders" <Daniel.Sanders at
imgtec.com<mailto:Daniel.Sanders at imgtec.com>>, "Hal
Finkel"
> <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 3:48:37 PM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
>
> This explanation makes a lot more sense to me. I think it would make
> sense to document this mental model, but I agree that this
> interpretation does not seem to require changes to the IR semantics.
The semantics, no. But we still may want to update the language reference. It
says, "It is always a no-op cast because no bits change with this
conversion. The conversion is done as if the value had been stored to memory and
read back as type ty2." And, what we've learned, is that this second
sentence does not always imply the first (the bits might, in fact, change).

 -Hal
>
> Just to check, this implies that DSE *is* legal right?
>
> Philip
>
>
> On 01/14/2016 05:48 AM, James Molloy wrote:
>
>
>
> Hi,
>
>
> I've given a bit of misinformation here and have caused some
> confusion. After talking with Tim and Mehdi last night on IRC, I
> need to correct what I said above to fall more in line with what
> Daniel is saying. If any of the below contradicts what I've said
> already, please accept my apologies. This version should be right.
>
>
> The behaviour of the code generator for big-endian NEON and MIPS is
> derived from the fact that we did not want to change IR semantics at
> all. A fundamental property that we do not want to break is memory
> round-tripping:
>
>
> %1 = load <4 x i32>, %p32
> %2 = bitcast <4 x i32> %1 to <2 x i64>
> store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
>
>
> The value of memory before and after the store MUST NOT change
> (contrary to what I said in an earlier post, I know).
>
>
> So in fact everything you can do in IR is valid. There are no changes
> to IR semantics in the slightest. However, when it comes to
> generating code from the IR, there are new rules:
> 1) Loads and stores are selected to be special loads and stores that
> do some transform from a canonical form in memory to a type-specific
> form in register.
> 2) Because bitcasts are load/store pairs in semantic, they must
> behave as if a store then load was done. Specifically (bitcast TyA
> to TyB) must transform TyA -> canonical form -> TyB, as a store then
> load would. Therefore bitcasts are not no-ops during code generation
> (*but behave as if they are from an IR perspective!*).
>
>
> The reason this works neatly in IR is due to the IR's type system. In
> order to change type, a cast must be inserted or a memory round
> trip. There is no other way. However in SDAG, things break down a
> bit. SDAG is more weakly typed, and bitconverts are often simply
> removed. We need that not to happen. Bitconverts are not no-ops.
>
>
> Daniel's explanation of physical register mapping was excellent so
> I'm not going to repeat that.
>
>
> I apologise for the confusion and misinformation. This is quite a
> complex topic and takes a bit of mind bending for me to understand,
> and it was a long time ago.
>
>
> James
>
>
> On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <
> Daniel.Sanders at imgtec.com<mailto:Daniel.Sanders at imgtec.com>
> wrote:
>
>
>
>
>
>
> > Ok. Then we need to change the LangRef as suggested. Given this is
> > a rather important semantic change, I think you need to send a top
> > level RFC to the list.
>
>
>
>
>
> FWIW, I don't think this is a semantic change to LLVM-IR itself. I
> think it's more clearing up the misconception that LLVM-IR semantics
> also apply to SelectionDAG's operations. That said, I do think it's
> important to mention this in LangRef since it's very easy to make
> this mistake and very few targets need to worry about the
> distinction.
>
>
>
> To explain why I don't think this is a semantic change to LLVM-IR,
> let's consider this example from earlier:
>
>
>
> %0 = load <4 x i32> %x
> %1 = bitcast <4 x i32> %0 to <2 x i64>
>
>
>
>
> store <2 x i64> %1, <2 x i64>* %y
>
>
>
>
>
> In LLVM-IR terms, if the value of %0 is:
>
> %0 = 0x00112233_44556677_8899aabb_ccddeeff
>
> then the value of %1 is:
>
> %1 = 0x0011223344556677_8899aabbccddeeff
>
> which agrees with the store/load and the 'no bits change'
statements
> in LangRef.
>
>
>
> However, the mapping of these bits to physical register bits is not
> consistent between types:
>
> Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
>
> Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
>
>
>
> Essentially, I'm saying that BitCastInst and ISD::BITCAST have
> slightly different semantics because of their different domains. The
> former is working on an abstract representation of the values where
> both statements in LangRef are true, but the latter is closer to the
> target where the 'no bits change' statement ceases to be true in
> some cases.
>
>
>
> > A couple of points that will need clarified:
> > - Does this only apply to vector types? It definitely doesn't
apply
> > between pointer types today. What about integer, floating point,
> > and FCAs?
>
>
>
>
>
> I've only seen it for vector types so far but in theory it could
> happen for other types. I'd expect FCAs to encounter it since the
> physical registers may contain padding that isn't present in the
> LLVM-IR representation and the placement and amount of padding will
> depend on the exact FCA.
>
> I can think of cases where address space casts can encounter the same
> problem but that's already been covered in LangRef ("It can be a
> no-op cast or a complex value modification, depending on the target
> and the address space pair.").
>
>
>
> Does anyone use FCAs directly? Most targets seem to convert them to
> same-sized integers or bitcast an FCA* to i8*.
>
>
>
>
> > - Is combining two casts into one a legal operation? I think it is
> > so far, but we need to explicitly state that.
>
>
>
>
>
> Yes, A->B->C and A->C are equivalent.
>
>
>
>
> > - Do we have a predicate for identifying no-op casts that can be
> > freely removed/combined?
>
>
>
>
>
> James mentioned one in CGP but I haven't been able to find it. I
> don't think it's necessary to have one at the LLVM-IR level but we
> do need one in the backends. I remember adding one to the backend
> but I can't find that either so I think I'm remembering one of my
> patches from before I split MSA's registers into type-specific
> classes.
>
>
>
>
> > - Is coercing a load to the type it's immediately bitcast to legal
> > under this model?
>
>
>
>
>
> Yes.
>
>
>
>
>
>
> From: llvm-dev [mailto: llvm-dev-bounces at
lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org> ] On Behalf
> Of Philip Reames via llvm-dev
> Sent: 13 January 2016 20:31
> To: James Molloy; Hal Finkel
> Cc: llvm-dev
>
>
>
>
>
>
>
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
>
>
>
>
>
>
>
>
>
> On 01/13/2016 12:20 PM, James Molloy wrote:
>
>
>
>
> > (Right?)
>
>
>
>
>
> Uh no, the register content explicitly does change :( We insert REV
> instructions (byteswap) on each bitcast. Bitcasts can be merged and
> elided etc, but conceptually there's a register content change on
> every bitcast.
>
> Ok. Then we need to change the LangRef as suggested. Given this is a
> rather important semantic change, I think you need to send a top
> level RFC to the list.
>
> A couple of points that will need clarified:
> - Does this only apply to vector types? It definitely doesn't apply
> between pointer types today. What about integer, floating point, and
> FCAs?
> - Is combining two casts into one a legal operation? I think it is so
> far, but we need to explicitly state that.
> - Do we have a predicate for identifying no-op casts that can be
> freely removed/combined?
> - Is coercing a load to the type it's immediately bitcast to legal
> under this model?
>
>
>
>
>
>
>
>
> James
>
>
>
>
>
> On Wed, 13 Jan 2016 at 18:09 Philip Reames <
> listmail at philipreames.com<mailto:listmail at philipreames.com>
> wrote:
>
>
>
>
>
> On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> > ----- Original Message -----
> >> From: "James Molloy" < james at
jamesmolloy.co.uk<mailto:james at jamesmolloy.co.uk> >
> >> To: "Hal Finkel" < hfinkel at
anl.gov<mailto:hfinkel at anl.gov> >
> >> Cc: "llvm-dev" < llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org> >, "Quentin
Colombet" <
> >> qcolombet at apple.com<mailto:qcolombet at apple.com> >
> >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> >> instruction selection
> >>
> >>
> >>> I think that teaching the optimizer about big-Endian lane
> >>> ordering
> >>> would have been better.
> >>
> >> It's certainly arguable. Even in hindsight I'm glad we
didn't -
> >> that's the approach GCC took and they've been fixing
subtle bugs
> >> in
> >> their vectorizer ever since.
> >>
> >>
> >>> Inserting the REV after every LDR
> >>
> >> We only do this conceptually. In most cases REVs cancel out, and
> >> we
> >> have the LD1 instruction which is LDR+REV. With enough peepholes
> >> there's really no need for code to run slower.
> >>
> >>
> >>> Given what's been done, should we update the LangRef.
> >>
> >> Potentially, yes. I hadn't realised quite how strongly worded
it
> >> was
> >> with respect to this.
> >>
> > Please do ;)
> I'm not sure changing bitcast is the right place. Since the bitcast
> is
> representing the in-register value (which doesn't change), maybe we
> should define it as part of the load/store instead? That's
> essentially
> what's going on; we're converting from a canonical register form to
a
> variety of memory forms. (Right?)
> >
> > -Hal
> >
> >> James
> >>
> >>
> >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at
anl.gov<mailto:hfinkel at anl.gov> > wrote:
> >>
> >>
> >>
> >>
> >> [resending so the message is smaller]
> >>
> >>
> >>
> >>
> >>
> >>
> >> From: "James Molloy via llvm-dev" < llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org> >
> >> To: "Quentin Colombet" < qcolombet at
apple.com<mailto:qcolombet at apple.com> >
> >> Cc: "llvm-dev" < llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org> >
> >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> >> instruction selection
> >>
> >> Hi Philip,
> >>
> >>
> >>
> >>
> >>
> >> store <2 x i64> %1, <2 x i64>* %y
> >>
> >> Yes. The memory pattern differs. This is the first diagram on the
> >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts )
> >>
> >>
> >> I think that teaching the optimizer about big-Endian lane ordering
> >> would have been better. Inserting the REV after every LDR sounds
> >> very similar to what we do for VSX on little-Endian PowerPC
> >> systems
> >> (PowerPC may have a slight advantage here in that we don't
need to
> >> do insertelement / extractelement / shufflevector through memory
> >> on
> >> systems where little-Endian mode is relevant, see
> >>
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> >> ).
> >>
> >> Given what's been done, should we update the LangRef. It
currently
> >> reads, " The ‘ bitcast ‘ instruction converts value to type
ty2 .
> >> It
> >> is always a no-op cast because no bits change with this
> >> conversion.
> >> The conversion is done as if the value had been stored to memory
> >> and
> >> read back as type ty2 ." But this is now, at the least,
> >> misleading,
> >> because this process of storing the value as one type and reading
> >> it
> >> back in as another does, in fact, change the bits. We need to make
> >> clear that this might change the bits (perhaps specifically by
> >> calling out this case of vector bitcasts on big-Endian systems?).
> >>
> >>
> >>
> >> Also, regarding this, " Most operating systems however do not
run
> >> with alignment faults enabled, so this is often not an
issue." Are
> >> you saying that the processor does the correct thing in this case
> >> (if alignment faults are not enabled, then it performs a proper
> >> unaligned load), or that the operating-system trap handler
> >> emulates
> >> the unaligned load should one occur?
> >>
> >> Thanks again,
> >> Hal
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >>
>
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/161233fc/attachment.html>

Hal Finkel via llvm-dev

2016-Jan-15 13:41 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

----- Original Message -----> From: "James Molloy" <james at jamesmolloy.co.uk>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Philip
Reames" <listmail at philipreames.com>
> Cc: llvm-dev at lists.llvm.org, "Daniel Sanders"
<Daniel.Sanders at imgtec.com>
> Sent: Friday, January 15, 2016 2:45:32 AM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
> 
> 
> Hi,
> 
> 
> > "It is always a no-op cast because no bits change with this
> > conversion. The conversion is done as if the value had been stored
> > to memory and read back as type ty2."
> 
> 
> I think a simple "as-if" in there should be sufficient;
> 
> 
> "It is always a no-op cast because it acts as if no bits change with
> this conversion. The conversion is done as if the value had been
> stored to memory and read back as type ty2."
> 
> 
> What do you think?
> 
I think this sounds confusing (and, regardless, we always get to apply 'as
if'). I see you're point, however, that any changes in the bits are
unobservable at the IR level. Is it true that int -> floating-point -> int
bitcasts round-trip cleanly for all possible values on all hardware? I was under
the impression that this was not true. I think that the best solution might just
be to delete the first sentence.

 -Hal
> 
> James
> 
> 
> On Thu, 14 Jan 2016 at 22:35 Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> ----- Original Message -----
> > From: "Philip Reames" < listmail at philipreames.com >
> > To: "James Molloy" < james at jamesmolloy.co.uk >,
"Daniel Sanders" <
> > Daniel.Sanders at imgtec.com >, "Hal Finkel"
> > < hfinkel at anl.gov >
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Thursday, January 14, 2016 3:48:37 PM
> > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > instruction selection
> > 
> > This explanation makes a lot more sense to me. I think it would
> > make
> > sense to document this mental model, but I agree that this
> > interpretation does not seem to require changes to the IR
> > semantics.
> 
> The semantics, no. But we still may want to update the language
> reference. It says, "It is always a no-op cast because no bits
> change with this conversion. The conversion is done as if the value
> had been stored to memory and read back as type ty2." And, what
> we've learned, is that this second sentence does not always imply
> the first (the bits might, in fact, change).
> 
> -Hal
> 
> > 
> > Just to check, this implies that DSE *is* legal right?
> > 
> > Philip
> > 
> > 
> > On 01/14/2016 05:48 AM, James Molloy wrote:
> > 
> > 
> > 
> > Hi,
> > 
> > 
> > I've given a bit of misinformation here and have caused some
> > confusion. After talking with Tim and Mehdi last night on IRC, I
> > need to correct what I said above to fall more in line with what
> > Daniel is saying. If any of the below contradicts what I've said
> > already, please accept my apologies. This version should be right.
> > 
> > 
> > The behaviour of the code generator for big-endian NEON and MIPS is
> > derived from the fact that we did not want to change IR semantics
> > at
> > all. A fundamental property that we do not want to break is memory
> > round-tripping:
> > 
> > 
> > %1 = load <4 x i32>, %p32
> > %2 = bitcast <4 x i32> %1 to <2 x i64>
> > store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
> > 
> > 
> > The value of memory before and after the store MUST NOT change
> > (contrary to what I said in an earlier post, I know).
> > 
> > 
> > So in fact everything you can do in IR is valid. There are no
> > changes
> > to IR semantics in the slightest. However, when it comes to
> > generating code from the IR, there are new rules:
> > 1) Loads and stores are selected to be special loads and stores
> > that
> > do some transform from a canonical form in memory to a
> > type-specific
> > form in register.
> > 2) Because bitcasts are load/store pairs in semantic, they must
> > behave as if a store then load was done. Specifically (bitcast TyA
> > to TyB) must transform TyA -> canonical form -> TyB, as a store
> > then
> > load would. Therefore bitcasts are not no-ops during code
> > generation
> > (*but behave as if they are from an IR perspective!*).
> > 
> > 
> > The reason this works neatly in IR is due to the IR's type system.
> > In
> > order to change type, a cast must be inserted or a memory round
> > trip. There is no other way. However in SDAG, things break down a
> > bit. SDAG is more weakly typed, and bitconverts are often simply
> > removed. We need that not to happen. Bitconverts are not no-ops.
> > 
> > 
> > Daniel's explanation of physical register mapping was excellent so
> > I'm not going to repeat that.
> > 
> > 
> > I apologise for the confusion and misinformation. This is quite a
> > complex topic and takes a bit of mind bending for me to understand,
> > and it was a long time ago.
> > 
> > 
> > James
> > 
> > 
> > On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <
> > Daniel.Sanders at imgtec.com > wrote:
> > 
> > 
> > 
> > 
> > 
> > 
> > > Ok. Then we need to change the LangRef as suggested. Given this
> > > is
> > > a rather important semantic change, I think you need to send a
> > > top
> > > level RFC to the list.
> > 
> > 
> > 
> > 
> > 
> > FWIW, I don't think this is a semantic change to LLVM-IR itself. I
> > think it's more clearing up the misconception that LLVM-IR
> > semantics
> > also apply to SelectionDAG's operations. That said, I do think
it's
> > important to mention this in LangRef since it's very easy to make
> > this mistake and very few targets need to worry about the
> > distinction.
> > 
> > 
> > 
> > To explain why I don't think this is a semantic change to LLVM-IR,
> > let's consider this example from earlier:
> > 
> > 
> > 
> > %0 = load <4 x i32> %x
> > %1 = bitcast <4 x i32> %0 to <2 x i64>
> > 
> > 
> > 
> > 
> > store <2 x i64> %1, <2 x i64>* %y
> > 
> > 
> > 
> > 
> > 
> > In LLVM-IR terms, if the value of %0 is:
> > 
> > %0 = 0x00112233_44556677_8899aabb_ccddeeff
> > 
> > then the value of %1 is:
> > 
> > %1 = 0x0011223344556677_8899aabbccddeeff
> > 
> > which agrees with the store/load and the 'no bits change'
> > statements
> > in LangRef.
> > 
> > 
> > 
> > However, the mapping of these bits to physical register bits is not
> > consistent between types:
> > 
> > Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
> > 
> > Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
> > 
> > 
> > 
> > Essentially, I'm saying that BitCastInst and ISD::BITCAST have
> > slightly different semantics because of their different domains.
> > The
> > former is working on an abstract representation of the values where
> > both statements in LangRef are true, but the latter is closer to
> > the
> > target where the 'no bits change' statement ceases to be true
in
> > some cases.
> > 
> > 
> > 
> > > A couple of points that will need clarified:
> > > - Does this only apply to vector types? It definitely doesn't
> > > apply
> > > between pointer types today. What about integer, floating point,
> > > and FCAs?
> > 
> > 
> > 
> > 
> > 
> > I've only seen it for vector types so far but in theory it could
> > happen for other types. I'd expect FCAs to encounter it since the
> > physical registers may contain padding that isn't present in the
> > LLVM-IR representation and the placement and amount of padding will
> > depend on the exact FCA.
> > 
> > I can think of cases where address space casts can encounter the
> > same
> > problem but that's already been covered in LangRef ("It can
be a
> > no-op cast or a complex value modification, depending on the target
> > and the address space pair.").
> > 
> > 
> > 
> > Does anyone use FCAs directly? Most targets seem to convert them to
> > same-sized integers or bitcast an FCA* to i8*.
> > 
> > 
> > 
> > 
> > > - Is combining two casts into one a legal operation? I think it
> > > is
> > > so far, but we need to explicitly state that.
> > 
> > 
> > 
> > 
> > 
> > Yes, A->B->C and A->C are equivalent.
> > 
> > 
> > 
> > 
> > > - Do we have a predicate for identifying no-op casts that can be
> > > freely removed/combined?
> > 
> > 
> > 
> > 
> > 
> > James mentioned one in CGP but I haven't been able to find it. I
> > don't think it's necessary to have one at the LLVM-IR level
but we
> > do need one in the backends. I remember adding one to the backend
> > but I can't find that either so I think I'm remembering one of
my
> > patches from before I split MSA's registers into type-specific
> > classes.
> > 
> > 
> > 
> > 
> > > - Is coercing a load to the type it's immediately bitcast to
> > > legal
> > > under this model?
> > 
> > 
> > 
> > 
> > 
> > Yes.
> > 
> > 
> > 
> > 
> > 
> > 
> > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> > Of Philip Reames via llvm-dev
> > Sent: 13 January 2016 20:31
> > To: James Molloy; Hal Finkel
> > Cc: llvm-dev
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > instruction selection
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On 01/13/2016 12:20 PM, James Molloy wrote:
> > 
> > 
> > 
> > 
> > > (Right?)
> > 
> > 
> > 
> > 
> > 
> > Uh no, the register content explicitly does change :( We insert REV
> > instructions (byteswap) on each bitcast. Bitcasts can be merged and
> > elided etc, but conceptually there's a register content change on
> > every bitcast.
> > 
> > Ok. Then we need to change the LangRef as suggested. Given this is
> > a
> > rather important semantic change, I think you need to send a top
> > level RFC to the list.
> > 
> > A couple of points that will need clarified:
> > - Does this only apply to vector types? It definitely doesn't
apply
> > between pointer types today. What about integer, floating point,
> > and
> > FCAs?
> > - Is combining two casts into one a legal operation? I think it is
> > so
> > far, but we need to explicitly state that.
> > - Do we have a predicate for identifying no-op casts that can be
> > freely removed/combined?
> > - Is coercing a load to the type it's immediately bitcast to legal
> > under this model?
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > James
> > 
> > 
> > 
> > 
> > 
> > On Wed, 13 Jan 2016 at 18:09 Philip Reames <
> > listmail at philipreames.com > wrote:
> > 
> > 
> > 
> > 
> > 
> > On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> > > ----- Original Message -----
> > >> From: "James Molloy" < james at
jamesmolloy.co.uk >
> > >> To: "Hal Finkel" < hfinkel at anl.gov >
> > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org
>, "Quentin Colombet" <
> > >> qcolombet at apple.com >
> > >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > >> instruction selection
> > >> 
> > >> 
> > >>> I think that teaching the optimizer about big-Endian lane
> > >>> ordering
> > >>> would have been better.
> > >> 
> > >> It's certainly arguable. Even in hindsight I'm glad
we didn't -
> > >> that's the approach GCC took and they've been fixing
subtle bugs
> > >> in
> > >> their vectorizer ever since.
> > >> 
> > >> 
> > >>> Inserting the REV after every LDR
> > >> 
> > >> We only do this conceptually. In most cases REVs cancel out,
and
> > >> we
> > >> have the LD1 instruction which is LDR+REV. With enough
peepholes
> > >> there's really no need for code to run slower.
> > >> 
> > >> 
> > >>> Given what's been done, should we update the LangRef.
> > >> 
> > >> Potentially, yes. I hadn't realised quite how strongly
worded it
> > >> was
> > >> with respect to this.
> > >> 
> > > Please do ;)
> > I'm not sure changing bitcast is the right place. Since the
bitcast
> > is
> > representing the in-register value (which doesn't change), maybe
we
> > should define it as part of the load/store instead? That's
> > essentially
> > what's going on; we're converting from a canonical register
form to
> > a
> > variety of memory forms. (Right?)
> > > 
> > > -Hal
> > > 
> > >> James
> > >> 
> > >> 
> > >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at
anl.gov >
> > >> wrote:
> > >> 
> > >> 
> > >> 
> > >> 
> > >> [resending so the message is smaller]
> > >> 
> > >> 
> > >> 
> > >> 
> > >> 
> > >> 
> > >> From: "James Molloy via llvm-dev" < llvm-dev at
lists.llvm.org >
> > >> To: "Quentin Colombet" < qcolombet at apple.com
>
> > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> > >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > >> instruction selection
> > >> 
> > >> Hi Philip,
> > >> 
> > >> 
> > >> 
> > >> 
> > >> 
> > >> store <2 x i64> %1, <2 x i64>* %y
> > >> 
> > >> Yes. The memory pattern differs. This is the first diagram on
> > >> the
> > >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts
)
> > >> 
> > >> 
> > >> I think that teaching the optimizer about big-Endian lane
> > >> ordering
> > >> would have been better. Inserting the REV after every LDR
sounds
> > >> very similar to what we do for VSX on little-Endian PowerPC
> > >> systems
> > >> (PowerPC may have a slight advantage here in that we
don't need
> > >> to
> > >> do insertelement / extractelement / shufflevector through
memory
> > >> on
> > >> systems where little-Endian mode is relevant, see
> > >>
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> > >> ).
> > >> 
> > >> Given what's been done, should we update the LangRef. It
> > >> currently
> > >> reads, " The ‘ bitcast ‘ instruction converts value to
type ty2
> > >> .
> > >> It
> > >> is always a no-op cast because no bits change with this
> > >> conversion.
> > >> The conversion is done as if the value had been stored to
memory
> > >> and
> > >> read back as type ty2 ." But this is now, at the least,
> > >> misleading,
> > >> because this process of storing the value as one type and
> > >> reading
> > >> it
> > >> back in as another does, in fact, change the bits. We need to
> > >> make
> > >> clear that this might change the bits (perhaps specifically
by
> > >> calling out this case of vector bitcasts on big-Endian
> > >> systems?).
> > >> 
> > >> 
> > >> 
> > >> Also, regarding this, " Most operating systems however
do not
> > >> run
> > >> with alignment faults enabled, so this is often not an
issue."
> > >> Are
> > >> you saying that the processor does the correct thing in this
> > >> case
> > >> (if alignment faults are not enabled, then it performs a
proper
> > >> unaligned load), or that the operating-system trap handler
> > >> emulates
> > >> the unaligned load should one occur?
> > >> 
> > >> Thanks again,
> > >> Hal
> > >> 
> > >> 
> > >> _______________________________________________
> > >> 
> > >> 
> > >> LLVM Developers mailing list
> > >> llvm-dev at lists.llvm.org
> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >> 
> > >> 
> > >> --
> > >> Hal Finkel
> > >> Assistant Computational Scientist
> > >> Leadership Computing Facility
> > >> Argonne National Laboratory
> > >> 
> > 
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel via llvm-dev

2016-Jan-15 13:46 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

----- Original Message -----
> From: "Daniel Sanders" <Daniel.Sanders at imgtec.com>
> To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal
Finkel"
> <hfinkel at anl.gov>, "Philip Reames" <listmail at
philipreames.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, January 15, 2016 4:29:33 AM
> Subject: RE: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
> Hi,
> I think we just need to draw attention to the fact that other IR's
> may vary.
> I'm thinking we should add something like this to the ISD::BITCAST
> doxygen comment:
> This is subtly different from the bitcast instruction from LLVM-IR
> since this node may change the bits
> in the register. For example, this occurs on big-endian NEON and
> big-endian MSA where the layout
> of the bits in the register depends on the vector type and this node
> acts as a shuffle operation for
> some vector type combinations.I agree; improving the ISD::BITCAST documentation is a good idea. 
> And have LangRef say something like:
> The conversion is done as if the value had been stored to memory and
> read back as type ty2. This is equivalent to a no-op cast where no
> bits change with this conversion.
> .. caution::
> This equivalence does not necessarily apply to other IR's in LLVM.
> See ISD::BITCAST for an example.
> The '.. caution::' should render in the same way as the
'Rationale'
> box in http://llvm.org/docs/LangRef.html#volatile-memory-accesses.We should not do this. We try to keep the LangRef as implementation-independent
as possible, and thus, we don't explicitly discuss things like ISD nodes
there.

-Hal 
> From: James Molloy [mailto:james at jamesmolloy.co.uk]
> Sent: 15 January 2016 08:46
> To: Hal Finkel; Philip Reames
> Cc: llvm-dev at lists.llvm.org; Daniel Sanders
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
> Hi,
> > "It is always a no-op cast because no bits change with this
> > conversion. The conversion is done as if the value had been stored
> > to memory and read back as type ty2."
> I think a simple "as-if" in there should be sufficient;
> "It is always a no-op cast because it acts as if no bits change with
> this conversion. The conversion is done as if the value had been
> stored to memory and read back as type ty2."
> What do you think?
> James
> On Thu, 14 Jan 2016 at 22:35 Hal Finkel < hfinkel at anl.gov > wrote:
> > ----- Original Message -----
> 
> > > From: "Philip Reames" < listmail at philipreames.com
>
> 
> > > To: "James Molloy" < james at jamesmolloy.co.uk
>, "Daniel Sanders"
> > > <
> > > Daniel.Sanders at imgtec.com >, "Hal Finkel"
> 
> > > < hfinkel at anl.gov >
> 
> > > Cc: llvm-dev at lists.llvm.org
> 
> > > Sent: Thursday, January 14, 2016 3:48:37 PM
> 
> > > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > > instruction selection
> 
> > >
> 
> > > This explanation makes a lot more sense to me. I think it would
> > > make
> 
> > > sense to document this mental model, but I agree that this
> 
> > > interpretation does not seem to require changes to the IR
> > > semantics.
> 
> > The semantics, no. But we still may want to update the language
> > reference. It says, "It is always a no-op cast because no bits
> > change with this conversion. The conversion is done as if the value
> > had been stored to memory and read back as type ty2." And, what
> > we've learned, is that this second sentence does not always imply
> > the first (the bits might, in fact, change).
> 
> > -Hal
> 
> > >
> 
> > > Just to check, this implies that DSE *is* legal right?
> 
> > >
> 
> > > Philip
> 
> > >
> 
> > >
> 
> > > On 01/14/2016 05:48 AM, James Molloy wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > > Hi,
> 
> > >
> 
> > >
> 
> > > I've given a bit of misinformation here and have caused some
> 
> > > confusion. After talking with Tim and Mehdi last night on IRC, I
> 
> > > need to correct what I said above to fall more in line with what
> 
> > > Daniel is saying. If any of the below contradicts what I've
said
> 
> > > already, please accept my apologies. This version should be
> > > right.
> 
> > >
> 
> > >
> 
> > > The behaviour of the code generator for big-endian NEON and MIPS
> > > is
> 
> > > derived from the fact that we did not want to change IR semantics
> > > at
> 
> > > all. A fundamental property that we do not want to break is
> > > memory
> 
> > > round-tripping:
> 
> > >
> 
> > >
> 
> > > %1 = load <4 x i32>, %p32
> 
> > > %2 = bitcast <4 x i32> %1 to <2 x i64>
> 
> > > store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
> 
> > >
> 
> > >
> 
> > > The value of memory before and after the store MUST NOT change
> 
> > > (contrary to what I said in an earlier post, I know).
> 
> > >
> 
> > >
> 
> > > So in fact everything you can do in IR is valid. There are no
> > > changes
> 
> > > to IR semantics in the slightest. However, when it comes to
> 
> > > generating code from the IR, there are new rules:
> 
> > > 1) Loads and stores are selected to be special loads and stores
> > > that
> 
> > > do some transform from a canonical form in memory to a
> > > type-specific
> 
> > > form in register.
> 
> > > 2) Because bitcasts are load/store pairs in semantic, they must
> 
> > > behave as if a store then load was done. Specifically (bitcast
> > > TyA
> 
> > > to TyB) must transform TyA -> canonical form -> TyB, as a
store
> > > then
> 
> > > load would. Therefore bitcasts are not no-ops during code
> > > generation
> 
> > > (*but behave as if they are from an IR perspective!*).
> 
> > >
> 
> > >
> 
> > > The reason this works neatly in IR is due to the IR's type
> > > system.
> > > In
> 
> > > order to change type, a cast must be inserted or a memory round
> 
> > > trip. There is no other way. However in SDAG, things break down a
> 
> > > bit. SDAG is more weakly typed, and bitconverts are often simply
> 
> > > removed. We need that not to happen. Bitconverts are not no-ops.
> 
> > >
> 
> > >
> 
> > > Daniel's explanation of physical register mapping was
excellent
> > > so
> 
> > > I'm not going to repeat that.
> 
> > >
> 
> > >
> 
> > > I apologise for the confusion and misinformation. This is quite a
> 
> > > complex topic and takes a bit of mind bending for me to
> > > understand,
> 
> > > and it was a long time ago.
> 
> > >
> 
> > >
> 
> > > James
> 
> > >
> 
> > >
> 
> > > On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <
> 
> > > Daniel.Sanders at imgtec.com > wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > Ok. Then we need to change the LangRef as suggested. Given
this
> > > > is
> 
> > > > a rather important semantic change, I think you need to send
a
> > > > top
> 
> > > > level RFC to the list.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > FWIW, I don't think this is a semantic change to LLVM-IR
itself.
> > > I
> 
> > > think it's more clearing up the misconception that LLVM-IR
> > > semantics
> 
> > > also apply to SelectionDAG's operations. That said, I do
think
> > > it's
> 
> > > important to mention this in LangRef since it's very easy to
make
> 
> > > this mistake and very few targets need to worry about the
> 
> > > distinction.
> 
> > >
> 
> > >
> 
> > >
> 
> > > To explain why I don't think this is a semantic change to
> > > LLVM-IR,
> 
> > > let's consider this example from earlier:
> 
> > >
> 
> > >
> 
> > >
> 
> > > %0 = load <4 x i32> %x
> 
> > > %1 = bitcast <4 x i32> %0 to <2 x i64>
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > store <2 x i64> %1, <2 x i64>* %y
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > In LLVM-IR terms, if the value of %0 is:
> 
> > >
> 
> > > %0 = 0x00112233_44556677_8899aabb_ccddeeff
> 
> > >
> 
> > > then the value of %1 is:
> 
> > >
> 
> > > %1 = 0x0011223344556677_8899aabbccddeeff
> 
> > >
> 
> > > which agrees with the store/load and the 'no bits change'
> > > statements
> 
> > > in LangRef.
> 
> > >
> 
> > >
> 
> > >
> 
> > > However, the mapping of these bits to physical register bits is
> > > not
> 
> > > consistent between types:
> 
> > >
> 
> > > Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
> 
> > >
> 
> > > Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
> 
> > >
> 
> > >
> 
> > >
> 
> > > Essentially, I'm saying that BitCastInst and ISD::BITCAST
have
> 
> > > slightly different semantics because of their different domains.
> > > The
> 
> > > former is working on an abstract representation of the values
> > > where
> 
> > > both statements in LangRef are true, but the latter is closer to
> > > the
> 
> > > target where the 'no bits change' statement ceases to be
true in
> 
> > > some cases.
> 
> > >
> 
> > >
> 
> > >
> 
> > > > A couple of points that will need clarified:
> 
> > > > - Does this only apply to vector types? It definitely
doesn't
> > > > apply
> 
> > > > between pointer types today. What about integer, floating
> > > > point,
> 
> > > > and FCAs?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > I've only seen it for vector types so far but in theory it
could
> 
> > > happen for other types. I'd expect FCAs to encounter it since
the
> 
> > > physical registers may contain padding that isn't present in
the
> 
> > > LLVM-IR representation and the placement and amount of padding
> > > will
> 
> > > depend on the exact FCA.
> 
> > >
> 
> > > I can think of cases where address space casts can encounter the
> > > same
> 
> > > problem but that's already been covered in LangRef ("It
can be a
> 
> > > no-op cast or a complex value modification, depending on the
> > > target
> 
> > > and the address space pair.").
> 
> > >
> 
> > >
> 
> > >
> 
> > > Does anyone use FCAs directly? Most targets seem to convert them
> > > to
> 
> > > same-sized integers or bitcast an FCA* to i8*.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > - Is combining two casts into one a legal operation? I think
it
> > > > is
> 
> > > > so far, but we need to explicitly state that.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Yes, A->B->C and A->C are equivalent.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > - Do we have a predicate for identifying no-op casts that
can
> > > > be
> 
> > > > freely removed/combined?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > James mentioned one in CGP but I haven't been able to find
it. I
> 
> > > don't think it's necessary to have one at the LLVM-IR
level but
> > > we
> 
> > > do need one in the backends. I remember adding one to the backend
> 
> > > but I can't find that either so I think I'm remembering
one of my
> 
> > > patches from before I split MSA's registers into
type-specific
> 
> > > classes.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > - Is coercing a load to the type it's immediately
bitcast to
> > > > legal
> 
> > > > under this model?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Yes.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On
> > > Behalf
> 
> > > Of Philip Reames via llvm-dev
> 
> > > Sent: 13 January 2016 20:31
> 
> > > To: James Molloy; Hal Finkel
> 
> > > Cc: llvm-dev
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> 
> > > instruction selection
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > On 01/13/2016 12:20 PM, James Molloy wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > (Right?)
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Uh no, the register content explicitly does change :( We insert
> > > REV
> 
> > > instructions (byteswap) on each bitcast. Bitcasts can be merged
> > > and
> 
> > > elided etc, but conceptually there's a register content
change on
> 
> > > every bitcast.
> 
> > >
> 
> > > Ok. Then we need to change the LangRef as suggested. Given this
> > > is
> > > a
> 
> > > rather important semantic change, I think you need to send a top
> 
> > > level RFC to the list.
> 
> > >
> 
> > > A couple of points that will need clarified:
> 
> > > - Does this only apply to vector types? It definitely doesn't
> > > apply
> 
> > > between pointer types today. What about integer, floating point,
> > > and
> 
> > > FCAs?
> 
> > > - Is combining two casts into one a legal operation? I think it
> > > is
> > > so
> 
> > > far, but we need to explicitly state that.
> 
> > > - Do we have a predicate for identifying no-op casts that can be
> 
> > > freely removed/combined?
> 
> > > - Is coercing a load to the type it's immediately bitcast to
> > > legal
> 
> > > under this model?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > James
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > On Wed, 13 Jan 2016 at 18:09 Philip Reames <
> 
> > > listmail at philipreames.com > wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> 
> > > > ----- Original Message -----
> 
> > > >> From: "James Molloy" < james at
jamesmolloy.co.uk >
> 
> > > >> To: "Hal Finkel" < hfinkel at anl.gov >
> 
> > > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org
>, "Quentin Colombet"
> > > >> <
> 
> > > >> qcolombet at apple.com >
> 
> > > >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> 
> > > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for
global
> 
> > > >> instruction selection
> 
> > > >>
> 
> > > >>
> 
> > > >>> I think that teaching the optimizer about big-Endian
lane
> 
> > > >>> ordering
> 
> > > >>> would have been better.
> 
> > > >>
> 
> > > >> It's certainly arguable. Even in hindsight I'm
glad we didn't
> > > >> -
> 
> > > >> that's the approach GCC took and they've been
fixing subtle
> > > >> bugs
> 
> > > >> in
> 
> > > >> their vectorizer ever since.
> 
> > > >>
> 
> > > >>
> 
> > > >>> Inserting the REV after every LDR
> 
> > > >>
> 
> > > >> We only do this conceptually. In most cases REVs cancel
out,
> > > >> and
> 
> > > >> we
> 
> > > >> have the LD1 instruction which is LDR+REV. With enough
> > > >> peepholes
> 
> > > >> there's really no need for code to run slower.
> 
> > > >>
> 
> > > >>
> 
> > > >>> Given what's been done, should we update the
LangRef.
> 
> > > >>
> 
> > > >> Potentially, yes. I hadn't realised quite how
strongly worded
> > > >> it
> 
> > > >> was
> 
> > > >> with respect to this.
> 
> > > >>
> 
> > > > Please do ;)
> 
> > > I'm not sure changing bitcast is the right place. Since the
> > > bitcast
> 
> > > is
> 
> > > representing the in-register value (which doesn't change),
maybe
> > > we
> 
> > > should define it as part of the load/store instead? That's
> 
> > > essentially
> 
> > > what's going on; we're converting from a canonical
register form
> > > to
> > > a
> 
> > > variety of memory forms. (Right?)
> 
> > > >
> 
> > > > -Hal
> 
> > > >
> 
> > > >> James
> 
> > > >>
> 
> > > >>
> 
> > > >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at
anl.gov >
> > > >> wrote:
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> [resending so the message is smaller]
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> From: "James Molloy via llvm-dev" <
llvm-dev at lists.llvm.org >
> 
> > > >> To: "Quentin Colombet" < qcolombet at
apple.com >
> 
> > > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org
>
> 
> > > >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> 
> > > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for
global
> 
> > > >> instruction selection
> 
> > > >>
> 
> > > >> Hi Philip,
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> store <2 x i64> %1, <2 x i64>* %y
> 
> > > >>
> 
> > > >> Yes. The memory pattern differs. This is the first
diagram on
> > > >> the
> 
> > > >> right at:
http://llvm.org/docs/BigEndianNEON.html#bitconverts
> > > >> )
> 
> > > >>
> 
> > > >>
> 
> > > >> I think that teaching the optimizer about big-Endian
lane
> > > >> ordering
> 
> > > >> would have been better. Inserting the REV after every
LDR
> > > >> sounds
> 
> > > >> very similar to what we do for VSX on little-Endian
PowerPC
> 
> > > >> systems
> 
> > > >> (PowerPC may have a slight advantage here in that we
don't
> > > >> need
> > > >> to
> 
> > > >> do insertelement / extractelement / shufflevector
through
> > > >> memory
> 
> > > >> on
> 
> > > >> systems where little-Endian mode is relevant, see
> 
> > > >>
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> 
> > > >> ).
> 
> > > >>
> 
> > > >> Given what's been done, should we update the
LangRef. It
> > > >> currently
> 
> > > >> reads, " The ‘ bitcast ‘ instruction converts value
to type
> > > >> ty2
> > > >> .
> 
> > > >> It
> 
> > > >> is always a no-op cast because no bits change with this
> 
> > > >> conversion.
> 
> > > >> The conversion is done as if the value had been stored
to
> > > >> memory
> 
> > > >> and
> 
> > > >> read back as type ty2 ." But this is now, at the
least,
> 
> > > >> misleading,
> 
> > > >> because this process of storing the value as one type
and
> > > >> reading
> 
> > > >> it
> 
> > > >> back in as another does, in fact, change the bits. We
need to
> > > >> make
> 
> > > >> clear that this might change the bits (perhaps
specifically by
> 
> > > >> calling out this case of vector bitcasts on big-Endian
> > > >> systems?).
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> Also, regarding this, " Most operating systems
however do not
> > > >> run
> 
> > > >> with alignment faults enabled, so this is often not an
issue."
> > > >> Are
> 
> > > >> you saying that the processor does the correct thing in
this
> > > >> case
> 
> > > >> (if alignment faults are not enabled, then it performs a
> > > >> proper
> 
> > > >> unaligned load), or that the operating-system trap
handler
> 
> > > >> emulates
> 
> > > >> the unaligned load should one occur?
> 
> > > >>
> 
> > > >> Thanks again,
> 
> > > >> Hal
> 
> > > >>
> 
> > > >>
> 
> > > >> _______________________________________________
> 
> > > >>
> 
> > > >>
> 
> > > >> LLVM Developers mailing list
> 
> > > >> llvm-dev at lists.llvm.org
> 
> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> > > >>
> 
> > > >>
> 
> > > >> --
> 
> > > >> Hal Finkel
> 
> > > >> Assistant Computational Scientist
> 
> > > >> Leadership Computing Facility
> 
> > > >> Argonne National Laboratory
> 
> > > >>
> 
> > >
> 
> > >
> 
> > >
> 
> > --
> 
> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/c5af7996/attachment.html>

llvm dev - Jan 2016 - [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection