Hal Finkel via llvm-dev
2016-Jan-13 16:01 UTC
[llvm-dev] [GlobalISel] A Proposal for global instruction selection
----- Original Message -----> From: "James Molloy" <james at jamesmolloy.co.uk> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Quentin Colombet" <qcolombet at apple.com> > Sent: Wednesday, January 13, 2016 9:54:26 AM > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection > > > > I think that teaching the optimizer about big-Endian lane ordering > > would have been better. > > > It's certainly arguable. Even in hindsight I'm glad we didn't - > that's the approach GCC took and they've been fixing subtle bugs in > their vectorizer ever since. > > > > Inserting the REV after every LDR > > > We only do this conceptually. In most cases REVs cancel out, and we > have the LD1 instruction which is LDR+REV. With enough peepholes > there's really no need for code to run slower. > > > > Given what's been done, should we update the LangRef. > > > Potentially, yes. I hadn't realised quite how strongly worded it was > with respect to this. >Please do ;) -Hal> > James > > > On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov > wrote: > > > > > [resending so the message is smaller] > > > > > > > From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org > > To: "Quentin Colombet" < qcolombet at apple.com > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org > > Sent: Wednesday, January 13, 2016 2:35:32 AM > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global > instruction selection > > Hi Philip, > > > > > > store <2 x i64> %1, <2 x i64>* %y > > Yes. The memory pattern differs. This is the first diagram on the > right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts ) > > > I think that teaching the optimizer about big-Endian lane ordering > would have been better. Inserting the REV after every LDR sounds > very similar to what we do for VSX on little-Endian PowerPC systems > (PowerPC may have a slight advantage here in that we don't need to > do insertelement / extractelement / shufflevector through memory on > systems where little-Endian mode is relevant, see > http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf > ). > > Given what's been done, should we update the LangRef. It currently > reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It > is always a no-op cast because no bits change with this conversion. > The conversion is done as if the value had been stored to memory and > read back as type ty2 ." But this is now, at the least, misleading, > because this process of storing the value as one type and reading it > back in as another does, in fact, change the bits. We need to make > clear that this might change the bits (perhaps specifically by > calling out this case of vector bitcasts on big-Endian systems?). > > > > Also, regarding this, " Most operating systems however do not run > with alignment faults enabled, so this is often not an issue." Are > you saying that the processor does the correct thing in this case > (if alignment faults are not enabled, then it performs a proper > unaligned load), or that the operating-system trap handler emulates > the unaligned load should one occur? > > Thanks again, > Hal > > > _______________________________________________ > > > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Philip Reames via llvm-dev
2016-Jan-13 18:08 UTC
[llvm-dev] [GlobalISel] A Proposal for global instruction selection
On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:> ----- Original Message ----- >> From: "James Molloy" <james at jamesmolloy.co.uk> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Quentin Colombet" <qcolombet at apple.com> >> Sent: Wednesday, January 13, 2016 9:54:26 AM >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection >> >> >>> I think that teaching the optimizer about big-Endian lane ordering >>> would have been better. >> >> It's certainly arguable. Even in hindsight I'm glad we didn't - >> that's the approach GCC took and they've been fixing subtle bugs in >> their vectorizer ever since. >> >> >>> Inserting the REV after every LDR >> >> We only do this conceptually. In most cases REVs cancel out, and we >> have the LD1 instruction which is LDR+REV. With enough peepholes >> there's really no need for code to run slower. >> >> >>> Given what's been done, should we update the LangRef. >> >> Potentially, yes. I hadn't realised quite how strongly worded it was >> with respect to this. >> > Please do ;)I'm not sure changing bitcast is the right place. Since the bitcast is representing the in-register value (which doesn't change), maybe we should define it as part of the load/store instead? That's essentially what's going on; we're converting from a canonical register form to a variety of memory forms. (Right?)> > -Hal > >> James >> >> >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov > wrote: >> >> >> >> >> [resending so the message is smaller] >> >> >> >> >> >> >> From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org > >> To: "Quentin Colombet" < qcolombet at apple.com > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org > >> Sent: Wednesday, January 13, 2016 2:35:32 AM >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global >> instruction selection >> >> Hi Philip, >> >> >> >> >> >> store <2 x i64> %1, <2 x i64>* %y >> >> Yes. The memory pattern differs. This is the first diagram on the >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts ) >> >> >> I think that teaching the optimizer about big-Endian lane ordering >> would have been better. Inserting the REV after every LDR sounds >> very similar to what we do for VSX on little-Endian PowerPC systems >> (PowerPC may have a slight advantage here in that we don't need to >> do insertelement / extractelement / shufflevector through memory on >> systems where little-Endian mode is relevant, see >> http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf >> ). >> >> Given what's been done, should we update the LangRef. It currently >> reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It >> is always a no-op cast because no bits change with this conversion. >> The conversion is done as if the value had been stored to memory and >> read back as type ty2 ." But this is now, at the least, misleading, >> because this process of storing the value as one type and reading it >> back in as another does, in fact, change the bits. We need to make >> clear that this might change the bits (perhaps specifically by >> calling out this case of vector bitcasts on big-Endian systems?). >> >> >> >> Also, regarding this, " Most operating systems however do not run >> with alignment faults enabled, so this is often not an issue." Are >> you saying that the processor does the correct thing in this case >> (if alignment faults are not enabled, then it performs a proper >> unaligned load), or that the operating-system trap handler emulates >> the unaligned load should one occur? >> >> Thanks again, >> Hal >> >> >> _______________________________________________ >> >> >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >>
James Molloy via llvm-dev
2016-Jan-13 20:20 UTC
[llvm-dev] [GlobalISel] A Proposal for global instruction selection
> (Right?)Uh no, the register content explicitly does change :( We insert REV instructions (byteswap) on each bitcast. Bitcasts can be merged and elided etc, but conceptually there's a register content change on every bitcast. James On Wed, 13 Jan 2016 at 18:09 Philip Reames <listmail at philipreames.com> wrote:> > > On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote: > > ----- Original Message ----- > >> From: "James Molloy" <james at jamesmolloy.co.uk> > >> To: "Hal Finkel" <hfinkel at anl.gov> > >> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Quentin Colombet" < > qcolombet at apple.com> > >> Sent: Wednesday, January 13, 2016 9:54:26 AM > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction > selection > >> > >> > >>> I think that teaching the optimizer about big-Endian lane ordering > >>> would have been better. > >> > >> It's certainly arguable. Even in hindsight I'm glad we didn't - > >> that's the approach GCC took and they've been fixing subtle bugs in > >> their vectorizer ever since. > >> > >> > >>> Inserting the REV after every LDR > >> > >> We only do this conceptually. In most cases REVs cancel out, and we > >> have the LD1 instruction which is LDR+REV. With enough peepholes > >> there's really no need for code to run slower. > >> > >> > >>> Given what's been done, should we update the LangRef. > >> > >> Potentially, yes. I hadn't realised quite how strongly worded it was > >> with respect to this. > >> > > Please do ;) > I'm not sure changing bitcast is the right place. Since the bitcast is > representing the in-register value (which doesn't change), maybe we > should define it as part of the load/store instead? That's essentially > what's going on; we're converting from a canonical register form to a > variety of memory forms. (Right?) > > > > -Hal > > > >> James > >> > >> > >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov > wrote: > >> > >> > >> > >> > >> [resending so the message is smaller] > >> > >> > >> > >> > >> > >> > >> From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org > > >> To: "Quentin Colombet" < qcolombet at apple.com > > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org > > >> Sent: Wednesday, January 13, 2016 2:35:32 AM > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global > >> instruction selection > >> > >> Hi Philip, > >> > >> > >> > >> > >> > >> store <2 x i64> %1, <2 x i64>* %y > >> > >> Yes. The memory pattern differs. This is the first diagram on the > >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts ) > >> > >> > >> I think that teaching the optimizer about big-Endian lane ordering > >> would have been better. Inserting the REV after every LDR sounds > >> very similar to what we do for VSX on little-Endian PowerPC systems > >> (PowerPC may have a slight advantage here in that we don't need to > >> do insertelement / extractelement / shufflevector through memory on > >> systems where little-Endian mode is relevant, see > >> > http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf > >> ). > >> > >> Given what's been done, should we update the LangRef. It currently > >> reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It > >> is always a no-op cast because no bits change with this conversion. > >> The conversion is done as if the value had been stored to memory and > >> read back as type ty2 ." But this is now, at the least, misleading, > >> because this process of storing the value as one type and reading it > >> back in as another does, in fact, change the bits. We need to make > >> clear that this might change the bits (perhaps specifically by > >> calling out this case of vector bitcasts on big-Endian systems?). > >> > >> > >> > >> Also, regarding this, " Most operating systems however do not run > >> with alignment faults enabled, so this is often not an issue." Are > >> you saying that the processor does the correct thing in this case > >> (if alignment faults are not enabled, then it performs a proper > >> unaligned load), or that the operating-system trap handler emulates > >> the unaligned load should one occur? > >> > >> Thanks again, > >> Hal > >> > >> > >> _______________________________________________ > >> > >> > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> > >> -- > >> Hal Finkel > >> Assistant Computational Scientist > >> Leadership Computing Facility > >> Argonne National Laboratory > >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160113/15d7d48a/attachment.html>