thr3ads.net - search: "v16i32"

Displaying 20 results from an estimated 47 matches for "v16i32".

Did you mean: v16f32

2013 May 10

[LLVMdev] Predicated Vector Operations

...Constraints = "$dst = $oldvalue" in { > def MASKEDARITH : MyInstruction< > (outs VectorReg:$dst), > (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, > VectorReg:$oldvalue), > "add $dst {$mask}, $src1, $src2", > [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, > v16i32:$src2), v16i32:$oldvalue))]>; > } Ok, but where does $oldvalue come from? That is the trickty part as far as I can see and is why this isn't quite the same as handling two-address instructions. I agree that the pattern itself i...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2 mode. http://r...

[LLVMdev] v16i32/v16f32

2010 Jul 15

[LLVMdev] v16i32/v16f32

Hi I find types such as v16i32, v16f32 missing in my llvm version 2.7 So does the following page not list them http://llvm.org/docs/doxygen/html/classllvm_1_1MVT.html is that intentional for any reason or can I just add them ? thanks shrey

[LLVMdev] v16i32/v16f32

2010 Jul 15

[LLVMdev] v16i32/v16f32

On Wed, Jul 14, 2010 at 6:48 PM, shreyas krishnan <shreyas76 at gmail.com> wrote: > Hi > I find types such as v16i32, v16f32 missing in my llvm version 2.7 > > So does the following page not list them > http://llvm.org/docs/doxygen/html/classllvm_1_1MVT.html > > is that intentional for any reason or can I just add them ? As far as I know, they're not there simply because there isn't any...

[LLVMdev] Predicated Vector Operations

2013 May 11

[LLVMdev] Predicated Vector Operations

...t = $oldvalue" in { >> def MASKEDARITH : MyInstruction< >> (outs VectorReg:$dst), >> (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, >> VectorReg:$oldvalue), >> "add $dst {$mask}, $src1, $src2", >> [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, >> v16i32:$src2), v16i32:$oldvalue))]>; >> } > > Ok, but where does $oldvalue come from? That is the trickty part as far > as I can see and is why this isn't quite the same as handling > two-address instructions. I may...

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 11

X86 TRUNCATE cost for AVX & AVX2 mode

Hi, I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost calculation for TRUNCATE instruction in AVX mode. In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there it finds cost as 30 for this operation. 30 cost for this operation looks very high. Wondering why such a high cost kept for this, any pointers to understand this will be helpful. In few cases this restricts better vectorization...

[LLVMdev] Masked vector intrinsics and name mangling

2014 Oct 26

[LLVMdev] Masked vector intrinsics and name mangling

Hal, thank you for your opinion. I just was confused when I saw so long name " llvm.masked.load.v16i32.p0i32.v16i32.i32.v16i1" . If we stay with a short name, we do a step towards instruction form. - Elena -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Sunday, October 26, 2014 17:06 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Ma...

[LLVMdev] Predicated Vector Operations

2013 May 10

[LLVMdev] Predicated Vector Operations

...election of the sources): let Constraints = "$dst = $oldvalue" in { def MASKEDARITH : MyInstruction< (outs VectorReg:$dst), (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, VectorReg:$oldvalue), "add $dst {$mask}, $src1, $src2", [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, v16i32:$src2), v16i32:$oldvalue))]>; } That's actually pretty clean. Thanks On Thu, May 9, 2013 at 2:15 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > > On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail...

[LLVMdev] v16i32/v16f32

2010 Jul 17

[LLVMdev] v16i32/v16f32

...gh!"); What does the assertion mean ? thanks for all help!! shrey On Wed, Jul 14, 2010 at 6:56 PM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Wed, Jul 14, 2010 at 6:48 PM, shreyas krishnan <shreyas76 at gmail.com> wrote: >> Hi >> I find types such as v16i32, v16f32 missing in my llvm version 2.7 >> >> So does the following page not list them >> http://llvm.org/docs/doxygen/html/classllvm_1_1MVT.html >> >> is that intentional for any reason or can I just add them ? > > As far as I know, they're not there simply...

[LLVMdev] Masked vector intrinsics and name mangling

2014 Oct 26

[LLVMdev] Masked vector intrinsics and name mangling

Hi, The proposed masked vector intrinsics are overloaded - one intrinsic ID for multiple types. After name mangling it will look like: %res = call <16 x i32> @llvm.masked.load.v16i32.p0i32.v16i32.i32.v16i1(i32* %addr, <16 x i32>%passthru, i32 4, <16 x i1> %mask) 6 types x 3 vector sizes = 18 names for one operation I propose to remove name mangling from these intrinsics: %res = call <16 x i32> @llvm.masked.load (i32* %addr, <16 x i32>%passthru, i32 4, &...

[LLVMdev] Masked vector intrinsics and name mangling

2014 Oct 26

[LLVMdev] Masked vector intrinsics and name mangling

...gt;> Cc: llvmdev at cs.uiuc.edu >> Sent: Sunday, October 26, 2014 10:17:49 AM >> Subject: RE: [LLVMdev] Masked vector intrinsics and name mangling >> >> Hal, thank you for your opinion. >> I just was confused when I saw so long name " >> llvm.masked.load.v16i32.p0i32.v16i32.i32.v16i1" . >> If we stay with a short name, we do a step towards instruction form. > > I completely understand, I just don't think it matters all that much, and the logic necessary to handle it will just become a source of bugs (and thus a distraction). You don&...

[LLVMdev] v16i32/v16f32

2010 Jul 17

[LLVMdev] v16i32/v16f32

On Fri, Jul 16, 2010 at 5:14 PM, shreyas krishnan <shreyas76 at gmail.com> wrote: > I tried adding them in my backend however I run into the assertion > > assert((unsigned)VT.SimpleTy < sizeof(LoadExtActions[0])*4 && > ExtType < array_lengthof(LoadExtActions) && > "Table isn't big enough!"); > > What does the

[LLVMdev] v16i32/v16f32

2010 Jul 17

[LLVMdev] v16i32/v16f32

Thanks Eli ...I actually did that ..bumped it up by 2 that I had added. Any thing else that I might have done wrong ? I can see a different assert where it clearly depends on LAST_VALUETYPE assert((unsigned)VT.SimpleTy < MVT::LAST_VALUETYPE thanks shrey On Fri, Jul 16, 2010 at 5:20 PM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Fri, Jul 16, 2010 at 5:14 PM, shreyas

[LLVMdev] Predicated Vector Operations

2013 May 09

[LLVMdev] Predicated Vector Operations

On May 9, 2013, at 3:05 PM, Jeff Bush <jeffbush001 at gmail.com> wrote: > On Thu, May 9, 2013 at 8:10 AM, <dag at cray.com> wrote: >> Jeff Bush <jeffbush001 at gmail.com> writes: >> >>> %tx = select %mask, %x, <0.0, 0.0, 0.0 ...> >>> %ty = select %mask, %y, <0.0, 0.0, 0.0 ...> >>> %sum = fadd %tx, %ty >>> %newvalue

[LLVMdev] Predicated Vector Operations

2013 May 10

[LLVMdev] Predicated Vector Operations

...t;$dst = $oldvalue" in { >> def MASKEDARITH : MyInstruction< >> (outs VectorReg:$dst), >> (ins MaskReg:$mask, VectorReg:$src1, VectorReg:$src2, >> VectorReg:$oldvalue), >> "add $dst {$mask}, $src1, $src2", >> [(set v16i32:$dst, (vselect v16i1:$mask, (add v16i32:$src1, >> v16i32:$src2), v16i32:$oldvalue))]>; >> } > > Ok, but where does $oldvalue come from? That is the trickty part as far > as I can see and is why this isn't quite the same as handling > two-address instructions. >...

[LLVMdev] Altivec vs the type legalizer

2009 Nov 10

[LLVMdev] Altivec vs the type legalizer

PPC Altivec supports vector type v16i8 (and others) where the element type is not legal (in llvm's implementation). When we have a BUILD_VECTOR of these types with constant elements, LegalizeTypes first promotes the element types to i32, then builds a constant pool entry of type v16i32. This is wrong. I can fix it by truncating the elements back to i8 in ExpandBUILD_VECTOR. Does this seem like the right approach? I ask because we'll be relying on ConstantVector::get and getConstantPool to work even with elements of a type that's illegal for the target; current...

[LLVMdev] Matching patterns

2013 Jun 24

[LLVMdev] Matching patterns

I'm trying to create a TableGen pattern to match extract_vector_elt. My pattern looks like this: (set i32:$dest, (extract_vector_elt v16i32:$src, i32:$index)) However, when I compile, I get an error: error: Variable not defined: 'extract_vector_elt' However, if I omit the rule and attempt to compile something that uses this functionality with clang, I get this error, which is definitely using the name 'extract_vector...

[LLVMdev] Indexed Load and Store Intrinsics - proposal

2015 Mar 15

[LLVMdev] Indexed Load and Store Intrinsics - proposal

...provide the > result equivalent to serial scalar stores from least to most > significant vector elements. > > The new intrinsics are common for all targets, like recently > introduced masked load and store. > > Examples: > > <16 x float> @llvm.sindex.load.v16f32.v16i32 (i8 *%ptr, <16 x i32> %index, > i32 %scale) > <16 x float> @llvm.masked.sindex.load.v16f32.v16i32 (i8 *%ptr, <16 x i32> > %index, <16 x float> %passthru, <16 x i1> %mask) > void @llvm.sindex.store.v16f32.v16i64(i8* %ptr, <16 x float> %value, &l...

How to implement load/store for vector predicate register

2020 Jun 25

How to implement load/store for vector predicate register

...r short). The hardware has 64 vector registers(vr for short) and 8 vector predicate registers. And there is no move instructions between vr and vpr. vr supports many operations, and vpr supports vpror, vprxor, vprand and vprinv operations. A vr has 512 bits, and a vpr has 128 bits. vr is used for v16i32, v32i16, v64i8. And a scalar register has 32 bits. If we compare or add two v16i32, a element in vpr has 8 bits. If we compare or add two v64i8, then a element in vpr has 2 bits(one bit for compare flag and one bit for carry flag). A element in vpr contains carry flag and compare flag. We have de...

[LLVMdev] Indexed Load and Store Intrinsics - proposal

2014 Dec 18

[LLVMdev] Indexed Load and Store Intrinsics - proposal

..., two indices with same or close values) will provide the result equivalent to serial scalar stores from least to most significant vector elements. The new intrinsics are common for all targets, like recently introduced masked load and store. Examples: <16 x float> @llvm.sindex.load.v16f32.v16i32 (i8 *%ptr, <16 x i32> %index, i32 %scale) <16 x float> @llvm.masked.sindex.load.v16f32.v16i32 (i8 *%ptr, <16 x i32> %index, <16 x float> %passthru, <16 x i1> %mask) void @llvm.sindex.store.v16f32.v16i64(i8* %ptr, <16 x float> %value, <16 x 164> %inde...

search for: v16i32