thr3ads.net - llvm dev - [llvm-dev] [GlobalISel] A Proposal for global instruction selection [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Jan-13 14:38 UTC

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[resending so the message is smaller] 

----- Original Message -----

From: "James Molloy via llvm-dev" <llvm-dev at lists.llvm.org> 
To: "Quentin Colombet" <qcolombet at apple.com> 
Cc: "llvm-dev" <llvm-dev at lists.llvm.org> 
Sent: Wednesday, January 13, 2016 2:35:32 AM 
Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hi Philip, 

store <2 x i64> %1, <2 x i64>* %y 

Yes. The memory pattern differs. This is the first diagram on the right at:
http://llvm.org/docs/BigEndianNEON.html#bitconverts )

I think that teaching the optimizer about big-Endian lane ordering would have
been better. Inserting the REV after every LDR sounds very similar to what we do
for VSX on little-Endian PowerPC systems (PowerPC may have a slight advantage
here in that we don't need to do insertelement / extractelement /
shufflevector through memory on systems where little-Endian mode is relevant,
see
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf).

Given what's been done, should we update the LangRef. It currently reads,
" The ‘ bitcast ‘ instruction converts value to type ty2 . It is always a
no-op cast because no bits change with this conversion. The conversion is done
as if the value had been stored to memory and read back as type ty2 ." But
this is now, at the least, misleading, because this process of storing the value
as one type and reading it back in as another does, in fact, change the bits. We
need to make clear that this might change the bits (perhaps specifically by
calling out this case of vector bitcasts on big-Endian systems?).

Also, regarding this, " Most operating systems however do not run with
alignment faults enabled, so this is often not an issue." Are you saying
that the processor does the correct thing in this case (if alignment faults are
not enabled, then it performs a proper unaligned load), or that the
operating-system trap handler emulates the unaligned load should one occur?

Thanks again, 
Hal 
_______________________________________________ 

<blockquote>
LLVM Developers mailing list 
llvm-dev at lists.llvm.org 
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 

</blockquote>

-- 
Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160113/046c65cc/attachment.html>

James Molloy via llvm-dev

2016-Jan-13 15:54 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

>I think that teaching the optimizer about big-Endian lane ordering wouldhave been better.

It's certainly arguable. Even in hindsight I'm glad we didn't -
that's the
approach GCC took and they've been fixing subtle bugs in their vectorizer
ever since.
> Inserting the REV after every LDR
We only do this conceptually. In most cases REVs cancel out, and we have
the LD1 instruction which is LDR+REV. With enough peepholes there's really
no need for code to run slower.
> Given what's been done, should we update the LangRef.
Potentially, yes. I hadn't realised quite how strongly worded it was with
respect to this.

James

On Wed, 13 Jan 2016 at 14:39 Hal Finkel <hfinkel at anl.gov> wrote:
> [resending so the message is smaller]
>
>
> ------------------------------
>
>
> From: "James Molloy via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Quentin Colombet" <qcolombet at apple.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, January 13, 2016 2:35:32 AM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
> Hi Philip,
>
>       store <2 x i64> %1, <2 x i64>* %y
>
> Yes. The memory pattern differs. This is the first diagram on the right
> at: http://llvm.org/docs/BigEndianNEON.html#bitconverts )
>
>
> I think that teaching the optimizer about big-Endian lane ordering would
> have been better. Inserting the REV after every LDR sounds very similar to
> what we do for VSX on little-Endian PowerPC systems (PowerPC may have a
> slight advantage here in that we don't need to do insertelement /
> extractelement / shufflevector through memory on systems where
> little-Endian mode is relevant, see
>
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf).
>
>
> Given what's been done, should we update the LangRef. It currently
reads,
> " The ‘ bitcast ‘ instruction converts value to type ty2 . It is
always a
> no-op cast because no bits change with this conversion. The conversion is
> done as if the value had been stored to memory and read back as type ty2
."
> But this is now, at the least, misleading, because this process of storing
> the value as one type and reading it back in as another does, in fact,
> change the bits. We need to make clear that this might change the bits
> (perhaps specifically by calling out this case of vector bitcasts on
> big-Endian systems?).
>
> Also, regarding this, " Most operating systems however do not run with
> alignment faults enabled, so this is often not an issue." Are you
saying
> that the processor does the correct thing in this case (if alignment faults
> are not enabled, then it performs a proper unaligned load), or that the
> operating-system trap handler emulates the unaligned load should one occur?
>
> Thanks again,
> Hal
> _______________________________________________
>
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160113/6c5c3cdb/attachment.html>

Hal Finkel via llvm-dev

2016-Jan-13 16:01 UTC

head link

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

----- Original Message -----> From: "James Molloy" <james at jamesmolloy.co.uk>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Quentin
Colombet" <qcolombet at apple.com>
> Sent: Wednesday, January 13, 2016 9:54:26 AM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
selection
> 
> 
> > I think that teaching the optimizer about big-Endian lane ordering
> > would have been better.
> 
> 
> It's certainly arguable. Even in hindsight I'm glad we didn't -
> that's the approach GCC took and they've been fixing subtle bugs in
> their vectorizer ever since.
> 
> 
> > Inserting the REV after every LDR
> 
> 
> We only do this conceptually. In most cases REVs cancel out, and we
> have the LD1 instruction which is LDR+REV. With enough peepholes
> there's really no need for code to run slower.
> 
> 
> > Given what's been done, should we update the LangRef.
> 
> 
> Potentially, yes. I hadn't realised quite how strongly worded it was
> with respect to this.
> 
Please do ;)

 -Hal
> 
> James
> 
> 
> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> 
> [resending so the message is smaller]
> 
> 
> 
> 
> 
> 
> From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org
>
> To: "Quentin Colombet" < qcolombet at apple.com >
> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> Sent: Wednesday, January 13, 2016 2:35:32 AM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
> 
> Hi Philip,
> 
> 
> 
> 
> 
> store <2 x i64> %1, <2 x i64>* %y
> 
> Yes. The memory pattern differs. This is the first diagram on the
> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts )
> 
> 
> I think that teaching the optimizer about big-Endian lane ordering
> would have been better. Inserting the REV after every LDR sounds
> very similar to what we do for VSX on little-Endian PowerPC systems
> (PowerPC may have a slight advantage here in that we don't need to
> do insertelement / extractelement / shufflevector through memory on
> systems where little-Endian mode is relevant, see
>
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> ).
> 
> Given what's been done, should we update the LangRef. It currently
> reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It
> is always a no-op cast because no bits change with this conversion.
> The conversion is done as if the value had been stored to memory and
> read back as type ty2 ." But this is now, at the least, misleading,
> because this process of storing the value as one type and reading it
> back in as another does, in fact, change the bits. We need to make
> clear that this might change the bits (perhaps specifically by
> calling out this case of vector bitcasts on big-Endian systems?).
> 
> 
> 
> Also, regarding this, " Most operating systems however do not run
> with alignment faults enabled, so this is often not an issue." Are
> you saying that the processor does the correct thing in this case
> (if alignment faults are not enabled, then it performs a proper
> unaligned load), or that the operating-system trap handler emulates
> the unaligned load should one occur?
> 
> Thanks again,
> Hal
> 
> 
> _______________________________________________
> 
> 
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Jan 2016 - [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Reasonably Related Threads