thr3ads.net - search: "float2"

Displaying 20 results from an estimated 23 matches for "float2".

Did you mean: float

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...by some reason given with .align 0, which is invalid. Problem does not occur if compiled for sm_10. > cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple = "ptx64-unknown-unknown" %struct.float2 = type { float, float } define ptx_device void @__internal_dsmul(%struct.float2* noalias nocapture sret %agg.result, %struct.float2* nocapture byval %x, %struct.float2* nocapture byval %y) nounwind inlinehint alwaysinline { entry: %y1 = getelementptr inbounds %struct.float2* %x, i64 0, i32 1 %...

[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)

2011 Oct 20

[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)

...nk), it seems to use custom LLVM intrinsics, and is built around pure C macros. Clover uses a slightly more complex system, involving a Python script "compiling" a set of built-ins into four files. For example, this declaration (REPL is a macro that does a simple for()) : ---- def vecf : float2 float3 float4 float8 float16 native $type acospi $vecf : x:$type REPL($vecdim) result[i] = std::acos(x[i]) / M_PI; end ---- Is compiled to these fragments, one for each vector type (float2, float3, etc) : ---- // In stdlib_def.h : what the OpenCL C kernel sees float2 OVERLOAD acospi(...

[LLVMdev] Functions: sret and readnone

2009 Nov 05

[LLVMdev] Functions: sret and readnone

...guments in the pass' getModRefBehavior methods. However, I haven't been successful with this approach and hope that someone has an idea on how to fix this. Here's a step by step illustration of the problem: 1. The following source code is compiled ... intrinsic float4 sample(int tex, float2 tc); float4 main(int tex, float2 tc) { float4 x = sample(tex, tc); return 0.0; } 2. ... into the following LLVM code (after a bunch of optimizations have run): define void @"main$int$float2"([4 x float]* noalias nocapture sret, i32, [2 x float]) nounwind { %5 = alloca [4 x float],...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...hich is invalid. Problem does not occur if compiled > for sm_10. > >> cat test.ll > ; ModuleID = '__kernelgen_main_module' > target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" > target triple = "ptx64-unknown-unknown" > > %struct.float2 = type { float, float } > > define ptx_device void @__internal_dsmul(%struct.float2* noalias nocapture > sret %agg.result, %struct.float2* nocapture byval %x, %struct.float2* > nocapture byval %y) nounwind inlinehint alwaysinline { > entry: > %y1 = getelementptr inbounds %struct...

[LLVMdev] Functions: sret and readnone

2009 Oct 05

[LLVMdev] Functions: sret and readnone

Hi all, I'm currently building a DSL for a computer graphics project that is not unlike NVIDIA's Cg. I have an intrinsic with the following signature float4 sample(texture tex, float2 coords); that is translated to this LLVM IR code: declare void @"sample"(%float4* noalias nocapture sret, %texture, $float2) nounwind readnone The type float4 is basically an array of four floats, which cannot be returned directly on an x86 using the traditional calling conventions but...

[LLVMdev] Functions: sret and readnone

2009 Oct 06

[LLVMdev] Functions: sret and readnone

On 5 Okt., 23:33, Dan Gohman <goh... at apple.com> wrote: > > Is there a reason it needs to be an array? A vector of four floats > wouldn't have this problem, if that's an option. > Unfortunately that's not an option. At the moment I'm restricting myself to the use of scalar code only, in order to be able to vectorize the code easily later (e.g., float4 as it is

[LLVMdev] ANN: libclc (OpenCL C library implementation)

2011 Oct 20

[LLVMdev] ANN: libclc (OpenCL C library implementation)

Hi Carlos, On 10/20/11 9:54 AM, Carlos Sánchez de La Lama wrote: >> The project started as a use-case for our "Whole-Function Vectorization" >> library, which allows to transform a function to compute the same as W >> executions of the original code by using SIMD instructions (W = 4 for >> SSE/AltiVec, 8 for AVX). > > Quite interesting. We were planning to

[LLVMdev] Alignment of vectors

2008 Jul 18

[LLVMdev] Alignment of vectors

Consider the following C code: typedef __attribute__(( ext_vector_type(2) )) float float2; typedef __attribute__(( ext_vector_type(2) )) __attribute__(( aligned(4) )) float float2_align2; void foo(void) { const float * p; size_t offset; float2 tmp = *((float2_align2 *)(p+offset)); } When compiled with clang emit-llvm I get: define void @foo() { entry: %p = alloca float*, a...

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

...require having extra moves, but your example below would end up being something like the following: dp4 r100, r1, r2 mov r0.x, r100 (float4 => float1 extract_vector_elt) dp4 r101, r4, r5 mov r3.x, r101 (float4 => float1 extract_vector_elt) iadd r6.xy__, r0.x000, r3.0x00(float1 + float1 => float2 build_vector) dp4 r7.x, r8, r9 <as above> dp4 r10.x, r11, r12 <as above> iadd r13.xy__, r7.x000, f10.0x00(float1 + float1 => float2 build_vector) iadd r14, r13.xy00, r6.00xy (float2 + float2 => float4 build_vector) sub r15, r14, r9 It's not as compact and neat but it works an...

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in

[LLVMdev] Alias in LLVM 3.0

2012 Feb 27

[LLVMdev] Alias in LLVM 3.0

..., uint2); Both value-preserving and bit-preserving alias do the same thing for the above two cases. But here's an example of an alias where the results differ. It used to work with LLVM 2.9, but does not with LLVM 3.0... extern __attribute__((overloadable, weak, alias("__SH1I422"))) float2 shuffle(float2, uint2); In LLVM 2.9 and LLVM 3.0, our front-end generates: @__shuffle_2f32_2u32 = alias weak <2 x i32> (<2 x i32>, <2 x i32>)* @4 And the calls, before linking, look like: %call9 = call <2 x float> @__shuffle_2f32_2u32(<2 x float> %tmp7, <2 x i32&gt...

[LLVMdev] Alignment of vectors

2008 Jul 18

[LLVMdev] Alignment of vectors

On Fri, Jul 18, 2008 at 6:45 AM, Benedict Gaster <benedict.gaster at amd.com> wrote: > Consider the following C code: > > typedef __attribute__(( ext_vector_type(2) )) float float2; > typedef __attribute__(( ext_vector_type(2) )) __attribute__(( aligned(4) )) AFAIK, the aligned attribute doesn't do anything on a typedef of anything other than a struct/union type in either gcc or clang. It would be possible to implement something like this, but someone would need to s...

[LLVMdev] Functions: sret and readnone

2009 Nov 06

[LLVMdev] Functions: sret and readnone

Hi Stephan, > intrinsic float4 sample(int tex, float2 tc); > > float4 main(int tex, float2 tc) > { > float4 x = sample(tex, tc); > return 0.0; > } without additional information it would be wrong to remove the call to sample because it might write to a global variable. > As you can see, the call to the sample function is stil...

[LLVMdev] Functions: sret and readnone

2009 Oct 05

[LLVMdev] Functions: sret and readnone

On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote: > Hi all, > > I'm currently building a DSL for a computer graphics project that is > not unlike NVIDIA's Cg. I have an intrinsic with the following > signature > > float4 sample(texture tex, float2 coords); > > that is translated to this LLVM IR code: > > declare void @"sample"(%float4* noalias nocapture sret, %texture, > $float2) nounwind readnone > > The type float4 is basically an array of four floats, which cannot be > returned directly on an x86 using th...

[LLVMdev] change type allocoted register

2010 Jan 04

[LLVMdev] change type allocoted register

Hi; i am using llvm backend on x86 arch. My app ABI requires float2 (v2f32) to be passes as parameter and return in XMM0 register. Currently LLVM handles v2f32 using MMX register MM0. i wonder what changes do i need to do in LLVM to support that change; manipulating v2f32 (float2) using XMM and not MMX ? one place i identifies where a change needs to be done is X...

[LLVMdev] Alignment of vectors

2008 Jul 18

[LLVMdev] Alignment of vectors

...On 18/07/2008 16:30, "Eli Friedman" <eli.friedman at gmail.com> wrote: > On Fri, Jul 18, 2008 at 6:45 AM, Benedict Gaster > <benedict.gaster at amd.com> wrote: >> Consider the following C code: >> >> typedef __attribute__(( ext_vector_type(2) )) float float2; >> typedef __attribute__(( ext_vector_type(2) )) __attribute__(( aligned(4) )) > > AFAIK, the aligned attribute doesn't do anything on a typedef of > anything other than a struct/union type in either gcc or clang. It > would be possible to implement something like this, but...

[LLVMdev] Loads moving across barriers

2013 Nov 08

[LLVMdev] Loads moving across barriers

Hi, For a long time we've been having a problem we've been working around in OpenCL where loads are moving across an intrinsic used for a barrier. Attached is the testcase, and the result of opt -S -basicaa -gvn on it. This example is essentially this: void foo(global float2* result, local float2* restrict data0, ...) { int id = get_local_id(0); // ... data0[id] = ...; barrier(); if (id < N) { float2 x = data0[idx]; int other_index = ...; data0[other_index] = x; } barrier(); result[id] = data...

[LLVMdev] Vector instructions

2008 Jun 27

[LLVMdev] Vector instructions

...a > sufficiently generalized shufflevector would remove the need for > insertelement and extractelement to exist completely. You should look into how this works with clang. Clang allows you to do things like this, for example: typedef __attribute__(( ext_vector_type(4) )) float float4; float2 vec2, vec2_2; float4 vec4, vec4_2; float f; void test2() { vec2 = vec4.xy; // shorten f = vec2.x; // extract elt vec4 = vec4.yyyy; // splat vec4.zw = vec2; // insert } etc. It also offers operators to extract all the even or odd elements of a vector, do arbitrary...

[LLVMdev] Functions: sret and readnone

2009 Nov 06

[LLVMdev] Functions: sret and readnone

...take a look at the code, please? Am I missing something here? class VISIBILITY_HIDDEN MySretAliasAnalysis : public FunctionPass, public AliasAnalysis { std::map<std::string, bool> _srets; public: static char ID; MySretAliasAnalysis() : FunctionPass(&ID) { _srets["sample$int$float2"] = true; _srets["sample$int$float3"] = true; } void getAnalysisUsage(llvm::AnalysisUsage &usage) const { AliasAnalysis::getAnalysisUsage(usage); usage.setPreservesAll(); } bool runOnFunction(Function &F) { AliasAnalysis::InitializeAliasAnalysis(this); retur...

[LLVMdev] Vector instructions

2008 Jun 27

[LLVMdev] Vector instructions

Hi Dan, Thanks for your comments. I've responded inline below. On 26-Jun-08, at 6:49 PM, Dan Gohman wrote: > On Jun 26, 2008, at 1:56 PM, Stefanus Du Toit wrote: >> >> === >> 1. Shufflevector only accepts vectors of the same type >> >> I would propose to change the syntax from: >> >>> <result> = shufflevector <n x <ty>>

search for: float2