similar to: [LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?

Displaying 20 results from an estimated 900 matches similar to: "[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?"

2015 Mar 04
2
[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
On Wed, Mar 4, 2015 at 10:49 AM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > On Wed, Mar 4, 2015 at 10:26 AM, Ryan Taylor <ryta1203 at gmail.com> wrote: >> Yes, it is breaking during the legalize phase, depending on which >> TargetLowering callback method we use. For example, Custom will let it >> through to instructions selection, which it breaks at the
2015 Mar 04
2
[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
Ahmed, Yes, this is the case, I'm sure many other 'spots' in DAGCombiner use this same check or use a similar check with LegalOperations. It just seems like bad form to have core code that generates an illegal node that legalization cannot seem to handle, unless I'm missing something, which is entirely possible. Potentially we are using the wrong LegalAction, though each I've
2015 Mar 05
2
[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
On Wed, Mar 4, 2015 at 11:43 AM, Ryan Taylor <ryta1203 at gmail.com> wrote: > Ahmed, > > Yes, we do not have an 8 bit type and do not support 8 bit loads/extloads. > > For your first post, I imagine that anything that the DAGCombiner does it > could undo EXCEPT deciding to opt to a type that is not allowed, No, I think the SelectionDAG legalization should be able to
2015 Mar 03
2
[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
1) It's crashing because LD1 is produced due to LegalOperations=false in pre-legalize pass. Then Legalization does not know how to handle it so it asserts on a default case. I don't know if it's a reasonable expectation or not but we do not have support for it. I have not tried overriding shouldReduceLoadWidth. 2) I see, that makes sense to some degree, I'm curious if you can
2015 Mar 03
3
[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
I'm curious about this code in ReduceLoadWidth (and in DAGCombiner in general): if (LegalOperations && !TLI.isLoadExtLegal(ExtType, ExtVT)) return SDValue <http://llvm.org/docs/doxygen/html/classllvm_1_1SDValue.html>(); LegalOperations is false for the first pre-legalize pass and true for the post-legalize pass. The first pass is target-independent yes? So that makes sense.
2008 Nov 20
1
Can't start pand -- is the problem in /etc/rc.d/init.d/pand?
I have been stepping into pand. There is nothing I can find on this list about getting pand working. There are some instructions on wifi.bluez.org that seem to be redhatish, but.... I have edited /etc/sysconfig/pand: PANDARGS='--listen --role GN --master' I have created a /etc/sysconfig/network-scripts/ifcfg-bnep0: DEVICE=bnep0 BOOTPROTO=none ONBOOT=no IPADDR=192.168.201.101
2020 Jan 03
2
Legalizing vector types
Hi all, I am working on a target that has support for v4i16 vectors, and no support for v4i8 / v8i8 / v8i16 V4i8 is promoted to v4i16 which is nice V8i16 is split to 2 x v4i16 which is nice as well Now v8i8 is scalarized, which is not so nice. Ideally I would like v8i8 to be first promoted to v8i16 then split to 2xv4i16 (or split to 2xV4i8 then promoted to 2xv4i16) Is there a way to achieve
2019 Sep 10
2
tablegen exponential behavior
Hi, I implemented a pattern matching of the dot product for arm64 and it seemed to work well for the basic case, i.e., class mulB<SDPatternOperator ldop> : PatFrag<(ops node:$Rn, node:$Rm, node:$offset), (mul (ldop (add node:$Rn, node:$offset)), (ldop (add node:$Rm, node:$offset)))>; class mulBz<SDPatternOperator ldop> : PatFrag<(ops node:$Rn,
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ! > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The
2015 Mar 06
2
[LLVMdev] ReduceLoadWidth, DAGCombiner and non 8bit loads/extloads question.
On Thu, Mar 5, 2015 at 4:19 PM, Ryan Taylor <ryta1203 at gmail.com> wrote: > Thanks for the reply: > > So should LLVM continue to assume 8-bit byte addressing? It would be nice, > not only to us but potential future machines, to have a permanent fix to > this assumption? This sounds reasonable yes? > > Marking them as Custom in XXXISelLowering still produces error, the
2014 Nov 08
2
[LLVMdev] [RFC] Exhaustive bitcode compatibility tests for IR features
It sounds like the Android RenderScript guys have the most in-the-trenches experience with bitcode incompatibilities. Stephen Hines (CC'd), what sorts of incompatibilities have you guys seen during the 3.x timeline? Would Steven Wu's proposal catch the sorts of incompatibilities that you guys have seen? -- Sean Silva On Thu, Nov 6, 2014 at 4:38 PM, Steven Wu <stevenwu at apple.com>
2016 Apr 11
2
X86 TRUNCATE cost for AVX & AVX2 mode
Hi, I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost calculation for TRUNCATE instruction in AVX mode. In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there it finds cost as 30 for this operation. 30 cost for this operation looks very high. Wondering why
2013 Mar 04
1
[LLVMdev] Custom Lowering of ARM zero-extending loads
Hi, For my research, I need to reshape the current ARM backend to support armv2a. Zero-extend half word load (ldrh) is not supported by armv2a, so I need to make the code generation to not generate ldrh instructions. I want to replace all those instances with a 32-bit load (ldr) and then and the result with 0xffff to mask out the upper bits. These are the modifications that I have made to
2012 Dec 14
0
9.1-RC3 problems with Creative Audigy 2 ZS [SB0350] audio
Hello :-) Recently I have switched to Creative Audigy 2 ZS [SB0350] PCI audio card from SoundBlaster Live! The new audio device although using the same kernel driver have some problems with audio/video streams - sound does not resemble original at all and the video player hangs (until audio buffer is flushed I guess). Did anyone enountered similar problem? I guess there is something wrong with
2009 Feb 19
2
[LLVMdev] Possible error in LegalizeDAG
-----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Eli Friedman Sent: Wednesday, February 18, 2009 3:01 PM To: LLVM Developers Mailing List Subject: Re: [LLVMdev] Possible error in LegalizeDAG On Wed, Feb 18, 2009 at 10:14 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote: > I'm still trying to track down some
2009 Feb 19
0
[LLVMdev] Possible error in LegalizeDAG
On Thu, Feb 19, 2009 at 10:35 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote: > On the hardware that I am targeting, which is not a CPU, I must support > i8 loads, however the hardware only supports natively 32bit aligned > loads, therefore I have to read in 4 i8's and unpack them and shift them > based on the read address. So any i8 load has a 75% chance of being >
2008 Nov 13
0
bluetooth pand help
Help, please. The man pand talks about a /etc/bluetooth/pan/dev-up; there is no such file on any of my Centos systems. I have studied http://bluez.sourceforge.net/contrib/HOWTO-PAN, so I have some of the basics. The only Centos related pand writeup I have found is http://howto.basjes.nl/linux/installing-my-new-server/networking. Should I just copy the dev-up file from there?
2014 Sep 19
2
[PATCH 0/2] nv50, nvc0: fix weirdo zs formats and their blits
There were reports of issues with gallium-nine. It's unclear whether mesa/st uses these, the patches did not produce any piglit changes. However they seem right... Ilia Mirkin (2): nv50,nvc0: add missing depth/stencil formats to tile flag selection nv50,nvc0: fix 3d blit logic for odd depth/stencil formats src/gallium/drivers/nouveau/nv50/nv50_blit.h | 21 ++++++++++++++-------
2009 Dec 10
0
[LLVMdev] SplitVecRes with SIGN_EXTEND_INREG unsupported
Eli, I have a simple SplitVecRes function that implements what you mentioned, splitting the LHS just as in BinaryOp, but passing through the RHS. The problem is that the second operand is MVT::Other, but when casted to an VTSDNode reveals that it is a vector length of the same size as the LHS SDValue. This causes a split on the LHS side to work correctly, but then it fails instruction selection