thr3ads.net - search: "v4f32"

Displaying 20 results from an estimated 151 matches for "v4f32".

2020 Aug 31

Should llvm optimize 1.0 / x ?

Hi, Here is a small C++ program: vec.cc: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { return 1.0 / x; } v4f32 fct2(v4f32 x) { return __builtin_ia32_rcpps(x); } Which is compiled to: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: c4 e2 79 18 0...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 22

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Hi Bruno, > Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > S...

[RFC] Vector Predication

2019 Feb 04

[RFC] Vector Predication

...rote: > > > On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de > <mailto:moll at cs.uni-saarland.de>> wrote: > > We could untie the mask length from the data length: > > %result = call <scalable 4 x float> > @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x float> > %y, <scalable 1 x i1> %M, i32 %L) > > would then indicate the mask %M applies to groups of "4 / 1" float > elements. > > > That would provide the greatest flexibility, as a 1:1 ratio could m...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 21

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

...hy!). > I'm sending the patch for comments, and in the hope that someone will > explain > how I should be doing the tablegen bits. This is awesome :D Some comments: + // Try to synthesize horizontal adds from adds of shuffles. + if (((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) || + (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) && + isHorizontalBinOp(LHS, RHS, true)) 1) You probably want to do something like: "bool HasHorizontalArith = Subtarget->hasSSE3() || Subtarget->hasAVX()" and...

Should llvm optimize 1.0 / x ?

2020 Sep 01

Should llvm optimize 1.0 / x ?

...-Quentin > > > > On Aug 31, 2020, at 2:21 PM, Alexandre Bique via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > > > Here is a small C++ program: > > > > vec.cc: > > > > #include <cmath> > > > > using v4f32 = float __attribute__((__vector_size__(16))); > > > > v4f32 fct1(v4f32 x) > > { > > return 1.0 / x; > > } > > > > v4f32 fct2(v4f32 x) > > { > > return __builtin_ia32_rcpps(x); > > } > > > > Which is compiled to: > > &gt...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 22

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

...o: Bruno Cardoso Lopes Cc: LLVMdev Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits Hi Bruno, > Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > S...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 21

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from floating point additions and subtractions of appropriate vector shuffles. To do this I introduced new x86 FHADD and FHSUB opcodes. These need to be wired up somehow in the .td file to the appropriate instructions. Since I have no idea how tablegen works I just hacked it in horribly. It works, but breaks support for the hadd

infer correct types from the pattern

2016 Mar 30

infer correct types from the pattern

i'm getting a Could not infer all types in pattern! error in my backend. it is happening on the following instruction: VGETITEM: (set GPR:{i32:f32}:$rD, (extractelt:{i32:f32} VR:{v4i32:v4f32}:$rA, GPR:i32:$rB)). how do i make it use appropriate types? in other words if it is f32 then use v4v32 and if it is i32 then use v4f32. i'm not sure even where to start? any help is appreciated. -- Rail Shafigulin Software Engineer Esencia Technologies -------------- next part ------------...

Vectorization of math function failed?

2020 Aug 31

Vectorization of math function failed?

Hi, After reading https://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls I decided to write the following C++ program: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { v4f32 y; y[0] = std::sin(x[0]); y[1] = std::sin(x[1]); y[2] = std::sin(x[2]); y[3] = std::sin(x[3]); return y; } v4f32 fct2(v4f32 x) { v4f32 y; for (int i = 0; i < 4; ++i) y[i] = std::sin(x[i]); return y;...

[LLVMdev] TableGen Type Inference

2009 Jun 04

[LLVMdev] TableGen Type Inference

Can someone explain why TableGen can't figure this out? VCVTDQ2PS128rm: (set:isVoid VR128:v4f32:$dst, (sint_to_fp:v4f32 (bitconvert:isInt (ld:v4i32 addr:iPTR:$src)<<P:Predicate_memop>>))) llvm/tblgen: In VCVTDQ2PS128rm: Could not infer all types in pattern! The pattern as written looks like this: [(set VR128:$dst, (v4f32 (sint_to_fp (bc_memopv4i32 addr:$src))))] I'm trying...

infer correct types from the pattern

2016 Mar 30

infer correct types from the pattern

On 3/30/2016 4:42 PM, Rail Shafigulin via llvm-dev wrote: > i'm getting a > > Could not infer all types in pattern! > > error in my backend. it is happening on the following instruction: > > VGETITEM: (set GPR:{i32:f32}:$rD, (extractelt:{i32:f32} > VR:{v4i32:v4f32}:$rA, GPR:i32:$rB)). > > how do i make it use appropriate types? in other words if it is f32 then > use v4v32 and if it is i32 then use v4f32. i'm not sure even where to start? You can use a cast, and force one type in the pattern, then use the other one in a Pat: def VGETITEM: [...

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

2019 Apr 04

[RFC] Changes to llvm.experimental.vector.reduce intrinsics

.... This behaviour is described in the LangRef (https://www.llvm.org/docs/LangRef.html#id1905) and is mentioned in https://bugs.llvm.org/show_bug.cgi?id=36734 and further discussed in D45336 and D59356. This means that for example: %res = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %v) does not result in %res being 'undef', but rather a reduction of <4 x float> %v. The definition of these intrinsics are different from their corresponding SelectionDAG nodes which explicitly split out a non-strict VECREDUCE_FADD that explicitly does...

[LLVMdev] Vector splitting vs widening

2013 Mar 05

[LLVMdev] Vector splitting vs widening

...into the following problem with vector type legalization. Here's a quick example: Scalarize node result 0: 0x2348420: v1f32 = extract_subvector 0x23434a0, 0x2348320 [ID=0] Scalarize node result 0: 0x2348220: v1f32 = extract_subvector 0x23434a0, 0x23466e0 [ID=0] Split node result: 0x23469e0: v4f32 = extract_subvector 0x23435a0, 0x23466e0 [ID=0] Split node operand: 0x2346be0: v4i1 = setcc 0x23467e0, 0x23469e0, 0x23436a0 [ID=0] Split node result: 0x2348620: v2f32 = extract_subvector 0x23435a0, 0x2346de0 [ID=0] Widen node result 0: 0x2348820: v2i1 = setcc 0x2346ee0, 0x2348620, 0x23436a0 [ID=...

[RFC] Vector Predication

2019 Feb 01

[RFC] Vector Predication

...ation)? > Or, do you think the EVL proposal would need modification to > effectively support this (by adding a element group size argument to > EVL intrinsics or something)? We could untie the mask length from the data length: %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32 %L) would then indicate the the mask %M applies to groups of "4 / 1" float elements. - Simon > Jacob Lifshay > > On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev > <llvm-...

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

2013 Mar 22

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

...tried the command line given by you and I get the following error clang++ nbody.kernel.cu -Xclang -fcuda-is-device -I/home/upitamba/llvm-3.2.src/tools/clang/test/SemaCUDA/ -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_10 -S fatal error: error in backend: Cannot select: 0x334a870: v4f32 = NVPTXISD::MoveParam 0x334a770 [ORD=1] [ID=22] 0x334a770: v4f32 = TargetExternalSymbol'.PARAM0' [ID=1] In function: computeBodyAccel Am I doing anything wrong here ? Attached my new nbody.kernel.cu <http://llvm.1065342.n5.nabble.com/file/n56141/nbody.kernel.cu> .cu here -...

[LLVMdev] Opinions Wanted: New asm Comments

2011 Jul 11

[LLVMdev] Opinions Wanted: New asm Comments

I have a patch I'd like to commit that adds commentary to asm files about which TableGen pattern generated a particular instruction. The output looks like this: cvtpd2ps %xmm0, %xmm0 # source.c:39 # Src: (intrinsic_wo_chain:v4f32 927:iPTR, VR128:v2f64:$src) # Dst: (Int_CVTPD2PSrr:v4f32 VR128:v2f64:$src) This is enormously helpful when trying to track down codegen bugs but clutters the asm file pretty badly for "ordinary" users. Right now I have this under control of a sep...

Vector evolution?

2020 Sep 01

Vector evolution?

On Tue, Sep 1, 2020 at 5:10 PM Florian Hahn <florian_hahn at apple.com> wrote: > The loop vectorizer does not really handle loops that already operate on vectors, so that is why the loop using v4f32 does not get widened. > > Arguably the user explicitly asked for 4xfloat vectors in the v4f32 version, so that is what gets generated. In my case I have tons of legacy code written for SSE2 and if the compiler can make a better and correct version of it, why not? > (Those kinds of issues...

Matching ConstantFPSDNode tablegen

2018 Jun 07

Matching ConstantFPSDNode tablegen

...ern for tablegen but am having some issues. So LLVM doesn't seem to accept a floating point constant literal match like: %v = call <4 x float> @foo(i32 15, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0) ret <4 x float> %v def : XXXPat<(v4f32 (int_foo i32:$mask, f32:$s, 0, v8i32:$rsrc, v4i32:$sampler, i1:$unorm, 0, i32:$cachepolicy)), (FOO_MI (COPY_TO_REGCLASS ?:$s, 32RegClass), ?:$rsrc, ?:$sampler, (as_i32imm ?:$mask), (as_i1imm ?:$unorm), (as_i1imm ?:$cachepolicy), (as_i1imm ?:$cachepolicy), 0, 0, 0, { 0 })>; which would be ideal....

[LLVMdev] Other Intrinsics?

2007 Nov 27

[LLVMdev] Other Intrinsics?

...n: what is the advantage of instructions over intrinsics? Why not get rid of fmod? If we got rid of fmod, why not fadd, fdiv, ? > The main reason for adding intrinsics instead of just using C library > calls is for support for vector types. @llvm.sin.* can be overloaded as > @llvm.sin.v4f32, for example, which is very useful for some users. My question is "how can these be used" by people. Specifically, these need to be lowered to some sort of runtime calls (no hardware has support for these) and llvm doesn't provide a standard runtime yet. Unless the codegen has a...

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

...t is not the vectorizer that is the issue, it is the ARM backend that > currently translates vectorized floating point IR to NEON instructions (it > should scalarize it if desired to do so - i.e. if people care about > denormals). > Hi Arnold, Can't the vectorizer not generate the v4f32 vectors in the first place, with that flag disabled? To fix this issue one would have to fix the backend: i.e not declare v4f32 > et al as legal (under a flag). As to making this predicated on fast math > flags on operations (something like no-denormals - i don’t think we have > that in...

search for: v4f32