Ahmed Bougacha
2014-Dec-02 22:24 UTC
[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?
Hi Chandler, all, Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared legal? Is it a simple oversight, or did I miss a deeper reason? While cleaning up PMOV*X patterns, I stumbled upon this braindead testcase: %0 = load <8 x i8>* %src, align 1 %1 = zext <8 x i8> %0 to <8 x i16> turning into: pmovzxbw (%rsi), %xmm0 pand <0xff,0xff,...>, %xmm0, %xmm0 v8i8 isn't legal, so the load became an anyext load from v8i8 to v8i16, with the pand masking out the unwanted/zero bits. In that example, if you declare zextloads from v8i8 legal, and add the simple corresponding pattern, the pand isn't generated anymore, as expected. So, unless I'm missing something, shouldn't we declare them legal? Insights much appreciated, thanks! - Ahmed
Chandler Carruth
2014-Dec-03 04:21 UTC
[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?
On Tue, Dec 2, 2014 at 2:24 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> Hi Chandler, all, > > Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared > legal? Is it a simple oversight, or did I miss a deeper reason? >While hacking on this, I tried to make them legal, and failed. I don't recall everything that went wrong though, and perhaps you'll have better luck than I did.> > > While cleaning up PMOV*X patterns, I stumbled upon this braindead testcase: > > %0 = load <8 x i8>* %src, align 1 > %1 = zext <8 x i8> %0 to <8 x i16> > > turning into: > > pmovzxbw (%rsi), %xmm0 > pand <0xff,0xff,...>, %xmm0, %xmm0 > > v8i8 isn't legal, so the load became an anyext load from v8i8 to > v8i16, with the pand masking out the unwanted/zero bits. >I've seen this too. It's horrible.> > In that example, if you declare zextloads from v8i8 legal, and add the > simple corresponding pattern, the pand isn't generated anymore, as > expected. >Won't type legalization insist on legalizing the <8 x i8> type even though we can do the extload? My memory of this is very dim. If this "just works" as expected, then by all means, lets do it. Speaking of which, I should actually go nuke the old shuffle lowering. Some of my problems may have only been problems with it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141202/8e620a23/attachment.html>
Ahmed Bougacha
2014-Dec-04 19:02 UTC
[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?
On Tue, Dec 2, 2014 at 8:21 PM, Chandler Carruth <chandlerc at gmail.com> wrote:> > On Tue, Dec 2, 2014 at 2:24 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: >> >> Hi Chandler, all, >> >> Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared >> legal? Is it a simple oversight, or did I miss a deeper reason? > > > While hacking on this, I tried to make them legal, and failed. I don't > recall everything that went wrong though, and perhaps you'll have better > luck than I did.So I gave it a shot, but to avoid an ugly hack in X86, I tried to do the right thing and changed the extload legalization logic to have a 2D array (memory->result types, not just memory), which just seems like the saner thing to do: http://reviews.llvm.org/D6532 The X86 stuff is much simpler: http://reviews.llvm.org/D6533>> >> >> >> While cleaning up PMOV*X patterns, I stumbled upon this braindead >> testcase: >> >> %0 = load <8 x i8>* %src, align 1 >> %1 = zext <8 x i8> %0 to <8 x i16> >> >> turning into: >> >> pmovzxbw (%rsi), %xmm0 >> pand <0xff,0xff,...>, %xmm0, %xmm0 >> >> v8i8 isn't legal, so the load became an anyext load from v8i8 to >> v8i16, with the pand masking out the unwanted/zero bits. > > > I've seen this too. It's horrible. > >> >> >> In that example, if you declare zextloads from v8i8 legal, and add the >> simple corresponding pattern, the pand isn't generated anymore, as >> expected. > > > Won't type legalization insist on legalizing the <8 x i8> type even though > we can do the extload? My memory of this is very dim. If this "just works" > as expected, then by all means, lets do it.>From what I've seen, not if the illegal type is only an extload's (ortruncstore's) memory type: then, anything is OK. - Ahmed> Speaking of which, I should actually go nuke the old shuffle lowering. Some > of my problems may have only been problems with it.
Reasonably Related Threads
- [LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
- [LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
- [LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
- [LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
- [LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.