thr3ads.net - search: "fp32"

Displaying 20 results from an estimated 38 matches for "fp32".

Did you mean: f32

2011 Mar 18

[LLVMdev] [PATCH] OpenCL half support

...m like 16bit ints, but it would be nice to be able to represent them correctly. </pre> </blockquote> Maybe worth pointing out that there are architectures that natively support 16bit floating point in llvm. PTX, the new backend of which has just been added to 2.9 can handle fp16 -> fp32 conversion in hardware. I agree we should have support for fp16 in the IR, it's fiddly trying to make do without this and gets used frequently in simulations and graphics in particular.<br> <br> <blockquote cite="mid:D556DCBBC0EA924A9AC80003B62121120600E507D8@sausexmbp02.am...

[LLVMdev] Enhancing TableGen

2011 Oct 08

[LLVMdev] Enhancing TableGen

...if you don't want an extra layer of abstraction (which adds extra looking-ups to someone reading td files), but I think we can have for-loop inside a multiclass without abstractions. -------------------- multiclass sse_binop<opcode> { for type = [f32, f64, v4f32, v2f64] regclass = [FP32, FP64, VR128, VR128] suffix = [ss, sd, ps, pd] { def !toupper(suffix)#rr : Instr< [(set (type regclass:$dst), (type (opcode (type regclass:$src1), (type regclass:$src2))))]>; def !toupper(suffix)#rm : Instr< [(set (type r...

[LLVMdev] FP emulation

2006 Oct 10

[LLVMdev] FP emulation

...thing complex here. >> For the time being, I'd suggest defining an "fp register set" which >> just aliases the integer register set (i.e. say that d0 overlaps >> r0+r1). > > OK. I almost did this way already. But I introduced two FP register > sets. One for fp32 (for the future) and one for fp64. fp32 aliases the > integer register set. fp64 aliases the fp32 register set, but not the > integer register set explicitly. I thought that aliases are transitive? > Or do I have to mention all aliases explicitly, e.g. for %d0 I need to > say [%s0,%s1,%...

[LLVMdev] FP emulation

2006 Oct 10

[LLVMdev] FP emulation

...don't allow the FP side to use it. > > For the time being, I'd suggest defining an "fp register set" which > just aliases the integer register set (i.e. say that d0 overlaps > r0+r1). OK. I almost did this way already. But I introduced two FP register sets. One for fp32 (for the future) and one for fp64. fp32 aliases the integer register set. fp64 aliases the fp32 register set, but not the integer register set explicitly. I thought that aliases are transitive? Or do I have to mention all aliases explicitly, e.g. for %d0 I need to say [%s0,%s1,%GR0,%GR1]? But a mo...

ARM vectorized fp16 support

2019 Sep 05

ARM vectorized fp16 support

...rate fused-multiply-add instructions for c += a * b. I'm wondering whether I did something wrong, if not, is it a missing feature that will be supported later? (I know there're fp16 FMLA intrinsics though) Test programs and outputs, $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD mov v0.16b, v2.16b ret $ cat vfp32.c #include <arm_neon.h> float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float...

Changes to TableGen in v4.0?

2017 Jan 23

Changes to TableGen in v4.0?

I am trying to upgrade to the LLVM v4.0 branch, but I am seeing failures in my TableGen descriptions for conversion from FP32 to FP16 (scalar and vector). The patterns I have are along the lines of: [(set (f16 RF16:$dst), (fround (f32 RF32:$src)))] or: [(set (v2f16 VF16:$dst), (fround (v2f32 VF32:$src)))] and these now produce the errors: error: In CONV_f32_f16: Type inference contradiction found, mergin...

[LLVMdev] [PATCH] OpenCL half support

2011 Mar 18

[LLVMdev] [PATCH] OpenCL half support

> Maybe worth pointing out that there are architectures that natively support > 16bit floating point in llvm. PTX, the new backend of which has just been > added to 2.9 can handle fp16 -> fp32 conversion in hardware. FWIW: there are already intrinsics for such conversions (currently only used in ARM backend). There is no need for new type if you want just to convert stuff. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University

[LLVMdev] Enhancing TableGen

2011 Oct 09

[LLVMdev] Enhancing TableGen

...yer of abstraction (which > adds extra looking-ups to someone reading td files), but I think we > can have for-loop inside a multiclass without abstractions. > > -------------------- > multiclass sse_binop<opcode> { > for type = [f32, f64, v4f32, v2f64] > regclass = [FP32, FP64, VR128, VR128] > suffix = [ss, sd, ps, pd] { > > def !toupper(suffix)#rr : Instr< > [(set (type regclass:$dst), (type (opcode (type regclass:$src1), > (type regclass:$src2))))]>; > def !toupper(suffix)#rm : Ins...

ARM vectorized fp16 support

2019 Sep 05

ARM vectorized fp16 support

...ons > for c += a * b. I'm wondering whether I did something wrong, if not, > is it a missing feature that will be supported later? (I know there're > fp16 FMLA intrinsics though) > > Test programs and outputs, > > $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c > test_vfma_lane_f16: // @test_vfma_lane_f16 > fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD > mov v0.16b, v2.16b > ret > $ cat vfp32.c > #include <arm_neon.h> > float32x4_t test_vfma_lane_f16(...

[LLVMdev] Enhancing TableGen

2011 Oct 07

[LLVMdev] Enhancing TableGen

...rns to express this? Using the for-loop syntax: // WARNING: Pseudo-code, many details elided for presentation purposes. multiclass binop<opcode> : sse_binop<opcode>, avx_binop<opcode>; multiclass sse_binop<opcode> { for type = [f32, f64, v4f32, v2f64] regclass = [FP32, FP64, VR128, VR128] suffix = [ss, sd, ps, pd] { def !toupper(suffix)#rr : Instr< [(set (type regclass:$dst), (type (opcode (type regclass:$src1), (type regclass:$src2))))]>; def !toupper(suffix)#rm : Instr< [(set (...

[LLVMdev] FP emulation

2006 Oct 09

[LLVMdev] FP emulation

On Mon, 9 Oct 2006, Roman Levenstein wrote: > I'm now ready to implement the FP support for my embedded target. cool. > My target supports only f64 at the moment. > Question: How can I tell LLVM that float is the same as double on my > target? May be by assigning the same register class to both MVT::f32 > and MVT::f64? Just don't assign a register class for the f32 type.

Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

2015 May 18

Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

...; (see gf100_gr_trap_mp). I assume some of the tessellation evaluation invocations get killed, but I have no proof of this. I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]] I would imagine that's some floating point number ending up in the register instead of an address, but the fp32 value of it (1.35107421875) does not seem familiar. Even when all the triangles show up, I still see the error on the GK208, so I'm not sure if they're the same issue or not. Now, here's the fun part -- this is completely non-deterministic. Sometimes everything shows up on the GK208,...

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

...ng the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago. The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is: float xAbs = fabsf(x); since we know our instruction for this does not handle denormals and the algorithm is sensitive to correct denormals, the code was written to avoid this issue as follows: float xAbs = __builtin_astype(__builtin_astype(x, unsigned) & 0x7FFFFFFF, float); But th...

[LLVMdev] FP emulation

2006 Oct 09

[LLVMdev] FP emulation

Hi, I'm now ready to implement the FP support for my embedded target. My target supports only f64 at the moment. Question: How can I tell LLVM that float is the same as double on my target? May be by assigning the same register class to both MVT::f32 and MVT::f64? But FP is supported only in the emulated mode, because the target does not have any hardware support for FP. Therefore each FP

Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

2015 May 26

Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

...llation evaluation >> invocations get killed, but I have no proof of this. >> >> I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]] >> >> I would imagine that's some floating point number ending up in the >> register instead of an address, but the fp32 value of it >> (1.35107421875) does not seem familiar. > > Ben pointed out that the 0x3facf000 is a channel address, not a value > from the shader. Oops. So that theory completely doesn't hold water. > Perhaps some buffer isn't big enough? This ends up using 9 output >...

[LLVMdev] Enhancing TableGen

2011 Oct 07

[LLVMdev] Enhancing TableGen

...t; syntax: > > // WARNING: Pseudo-code, many details elided for presentation purposes. > > multiclass binop<opcode> : sse_binop<opcode>, avx_binop<opcode>; > > multiclass sse_binop<opcode> { > for type = [f32, f64, v4f32, v2f64] > regclass = [FP32, FP64, VR128, VR128] > suffix = [ss, sd, ps, pd] { > > def !toupper(suffix)#rr : Instr< > [(set (type regclass:$dst), (type (opcode (type regclass:$src1), > (type regclass:$src2))))]>; > def !toupper(suffix)#rm...

[LLVMdev] Enhancing TableGen

2011 Oct 08

[LLVMdev] Enhancing TableGen

...> syntax: > > // WARNING: Pseudo-code, many details elided for presentation purposes. > > multiclass binop<opcode> : sse_binop<opcode>, avx_binop<opcode>; > > multiclass sse_binop<opcode> { > for type = [f32, f64, v4f32, v2f64] > regclass = [FP32, FP64, VR128, VR128] > suffix = [ss, sd, ps, pd] { > > def !toupper(suffix)#rr : Instr< > [(set (type regclass:$dst), (type (opcode (type regclass:$src1), > (type regclass:$src2))))]>; > def !toupper(suffix)#rm :...

[LLVMdev] float16/half float support situation? (and a problem)

2012 Apr 11

[LLVMdev] float16/half float support situation? (and a problem)

OpenCL defines half data type, and it seems clang accepts this and generates code for it. The backend support for operations with fp16 seems to be missing and it works (or should work?) by converting these to fp32 for the actual calculations? But I'm having problems with this. first I just tried to use fp16 data type, without any support in backend. This was expected to fail. I got error: LLVM ERROR: Cannot select: 0x2f566b0: i32 = fp32_to_fp16 0x2f66bb0 [ID=876] So I created an instruction patte...

[LLVMdev] FP emulation

2006 Oct 11

[LLVMdev] FP emulation

...For the time being, I'd suggest defining an "fp register set" > which > >> just aliases the integer register set (i.e. say that d0 overlaps > >> r0+r1). > > > > OK. I almost did this way already. But I introduced two FP register > > sets. One for fp32 (for the future) and one for fp64. fp32 aliases > the > > integer register set. fp64 aliases the fp32 register set, but not > the > > integer register set explicitly. I thought that aliases are > transitive? > > Or do I have to mention all aliases explicitly, e.g. for %d0...

RFC: SIMD math-function library

2016 Jul 13

RFC: SIMD math-function library

Dear LLVM contributors, I am Naoki Shibata, an associate professor at Nara Institute of Science and Technology. I and Hal Finkel would like to jointly propose to add my vectorized math library to LLVM. The library has been available as public domain software for years, I am going to double-license the library if necessary. ******** Below is a proposal to add my vectorized math library,

search for: fp32