thr3ads.net - search: "float4"

Displaying 20 results from an estimated 57 matches for "float4".

Did you mean: float

[LLVMdev] Win64 Calling Convention problem

2009 Dec 03

[LLVMdev] Win64 Calling Convention problem

...he following code is part of my host application that is compiled with Visual Studio 2005 in 64-bit debug mode. noise4 expects a structure of four floats as its first and only argument, which is - in accordance with the specs of the Win64 calling convention - passed by pointer. --- snip --- struct float4 { float x, y, z, w; } float noise4(float4 v) { 0000000140067AE0 mov qword ptr [rsp+8],rcx 0000000140067AE5 push rdi 0000000140067AE6 sub rsp,10h 0000000140067AEA mov rdi,rsp 0000000140067AED mov rcx,4 0000000140067AF7 mov eax,0CCCCCCCCh 00000001...

[LLVMdev] Functions: sret and readnone

2009 Oct 05

[LLVMdev] Functions: sret and readnone

Hi all, I'm currently building a DSL for a computer graphics project that is not unlike NVIDIA's Cg. I have an intrinsic with the following signature float4 sample(texture tex, float2 coords); that is translated to this LLVM IR code: declare void @"sample"(%float4* noalias nocapture sret, %texture, $float2) nounwind readnone The type float4 is basically an array of four floats, which cannot be returned directly on an x86 using the traditio...

[LLVMdev] Failure to optimize vector select

2013 Aug 20

[LLVMdev] Failure to optimize vector select

Hi, I've found a case I would expect would optimize easily, but it doesn't. A simple implementation of vector select: float4 simple_select(float4 a, float4 b, int4 c) { float4 result; result.x = c.x ? a.x : b.x; result.y = c.y ? a.y : b.y; result.z = c.z ? a.z : b.z; result.w = c.w ? a.w : b.w; return result; } I would expect this would be optimized to %bool = icmp eq <4 x i32> %c, 0 %r...

[LLVMdev] About JIT by LLVM 2.9 or later

2011 Nov 02

[LLVMdev] About JIT by LLVM 2.9 or later

Hello guys, Thanks for your help when you are busing. I am working on an open source project. It supports shader language and I want JIT feature, so LLVM is used. But now I find the ABI & Calling Convention did not co-work with MSVC. For example, following code I have: struct float4 { float x, y, z, w; }; struct float4x4 { float4 x, y, z, w; }; float4 fetch_vs( float4x4* mat ){ return mat->y; } Caller: // ... float4x4 mat; // Initialized float4 ret = fetch(mat); // fetch is JITed by LLVM float4 ret_vs = fetch_vs(mat) // ... Callee(LLVM): %vec4 = type { float, float, fl...

[LLVMdev] How to vectorize a vector type cast?

2012 Feb 28

[LLVMdev] How to vectorize a vector type cast?

Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as typedef float float4 __attribute__((ext_vector_type(4))); typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); float4 to_float4(uchar4 in) { float4 out = {in.x, in.y, in.z, in.w...

[LLVMdev] Making GEP into vector illegal?

2008 Oct 14

[LLVMdev] Making GEP into vector illegal?

In Joe programmer language (i.e. C ;) ), are we basically talking about disallowing: float4 a; float* ptr_z = &a.z; ? Won't programmers just resort to: float4 a; float* ptr_z = (float*)(&a) + 3; ? On Oct 14, 2008, at 3:55 PM, Mon Ping Wang wrote: > Hi, > > Something like a sequential type makes sense especially in light of > what Duncan is point out. I agr...

[LLVMdev] Functions: sret and readnone

2009 Oct 06

[LLVMdev] Functions: sret and readnone

...reason it needs to be an array? A vector of four floats > wouldn't have this problem, if that's an option. > Unfortunately that's not an option. At the moment I'm restricting myself to the use of scalar code only, in order to be able to vectorize the code easily later (e.g., float4 as it is now will then become an array of four vectors for parallel processing of n (probably 4, SSE) pixels). But thanks for coming up with this idea! Chris, I'll take a look at the AliasAnalysis functionality. Depending on how much effort it is to implement a solution I might follow this app...

[LLVMdev] Functions: sret and readnone

2009 Nov 05

[LLVMdev] Functions: sret and readnone

...iasAnalysis::AccessesArguments in the pass' getModRefBehavior methods. However, I haven't been successful with this approach and hope that someone has an idea on how to fix this. Here's a step by step illustration of the problem: 1. The following source code is compiled ... intrinsic float4 sample(int tex, float2 tc); float4 main(int tex, float2 tc) { float4 x = sample(tex, tc); return 0.0; } 2. ... into the following LLVM code (after a bunch of optimizations have run): define void @"main$int$float2"([4 x float]* noalias nocapture sret, i32, [2 x float]) nounwind { %5...

[LLVMdev] Making GEP into vector illegal?

2008 Oct 14

[LLVMdev] Making GEP into vector illegal?

On Tue, Oct 14, 2008 at 1:34 PM, Daniel M Gessel <gessel at apple.com> wrote: > In Joe programmer language (i.e. C ;) ), are we basically talking > about disallowing: > > float4 a; > float* ptr_z = &a.z; > > ? That's my reading as well; the argument for not allowing it is just to make optimization easier. We don't allow addressing individual bits either, and compilers obviously have to work around that for bitfields. AFAIK, both clang and gcc curre...

[LLVMdev] Resizing vector values

2010 Mar 25

[LLVMdev] Resizing vector values

...oduce a function operating on concrete vectors. For example, given vadd function operating on magic 17-element vectors: typedef float vfloat __attribute__((ext_vector_type(17))); vfloat vadd(vfloat a, vfloat b) { return a+b; } it should produce vadd operating on 4-element vectors: typedef float float4 __attribute__((ext_vector_type(4))); float4 vadd(float4 a, float4 b) { return a+b; } In other words, I only want to change the type from vfloat to float4 for arguments, result, instructions and operands (aka 'values'). Unfortunately, the type of an llvm::Value appears to be immutable? It...

[LLVMdev] Functions: sret and readnone

2009 Oct 05

[LLVMdev] Functions: sret and readnone

On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote: > Hi all, > > I'm currently building a DSL for a computer graphics project that is > not unlike NVIDIA's Cg. I have an intrinsic with the following > signature > > float4 sample(texture tex, float2 coords); > > that is translated to this LLVM IR code: > > declare void @"sample"(%float4* noalias nocapture sret, %texture, > $float2) nounwind readnone > > The type float4 is basically an array of four floats, which cannot be > returned...

[LLVMdev] Failure to optimize vector select

2013 Aug 20

[LLVMdev] Failure to optimize vector select

...ed running SLP vectorizer pass (-vectorize-slp)? Eugene On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > Hi, > > I've found a case I would expect would optimize easily, but it doesn't. A > simple implementation of vector select: > > float4 simple_select(float4 a, float4 b, int4 c) > { > float4 result; > > result.x = c.x ? a.x : b.x; > result.y = c.y ? a.y : b.y; > result.z = c.z ? a.z : b.z; > result.w = c.w ? a.w : b.w; > > return result; > } > > I would expect this would b...

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

2013 Mar 20

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

..._ __attribute__((address_space(2))) ^ /opt/cuda/include/host_defines.h:183:9: note: previous definition is here #define __constant__ \ ^ 1 warning generated. Another question is What about extern __shared__ ? I can see that the error goes away if I replace "extern __shared__ float4 sharedPos[]" with "__shared__ float4* sharedPos;". Do I have to dynamically allocate the shared memory by specifying size in kernel Launch? If so, why doesn't the second use of the same statement in another function cause the error ? I am using 3.2. -- View this message i...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

...neration. Today, building a vector for different lengths requires a sequence of extracting each element in the vector and inserting each element into a new vector. With this new form, it is more straightforward to write and reason about typedef __attribute__(( ext_vector_type(4) )) float float4; typedef __attribute__(( ext_vector_type(8) )) float float8; float8 f8; float4 f4a, f4b, f4c; f4a = f8.hi; f8.hi = f4b; f8.lo = f4c; where hi and lo represent the high half and low half of the vector. The outgoing IR is %f4a = shufflevector <8xf32>%f8, undef, <4xi32> <0, 1, 2,...

[LLVMdev] About JIT by LLVM 2.9 or later

2011 Nov 02

[LLVMdev] About JIT by LLVM 2.9 or later

空明流转 <wuye9036 at gmail.com> writes: > Could I wrap LLVM with mingw and expose some C api to called by MSVC? > > And in mingw, I will override the signature float4 foo( float44 ) to > float4* foo( float4*, float44* ); ? > > Is that OK? If you pass and return the structs through pointers, you don't need MinGW in the middle, you can use MSVC directly. Please note that using MinGW will not fix the LLVM sret problem.

[LLVMdev] [GSoC] "Microsoft Direct3D shader bytecode backend" proposal

2011 Mar 29

[LLVMdev] [GSoC] "Microsoft Direct3D shader bytecode backend" proposal

...ate the struct members with metadata I'm leaning towards the latter one. > > Perhaps a simple example would be nice, showing a very simple LLVM IR > input and the (proposed) bytecode output. How about this (this is from http://www.neatware.com/lbstudio/web/hlsl.html)? struct a2v { float4 position : POSITION; }; struct v2p { float4 position : POSITION; }; void main(in a2v IN, out v2p OUT, uniform float4x4 ModelViewMatrix) { OUT.position = mul(IN.position, ModelViewMatrix); } This would generate something like this (assuming I took the metadata route): %struct.a2v = { <4 x...

[LLVMdev] Failure to optimize vector select

2013 Aug 20

[LLVMdev] Failure to optimize vector select

Hi Matt, This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it. Thanks, Nadav On Aug 20...

[LLVMdev] Making GEP into vector illegal?

2008 Oct 15

[LLVMdev] Making GEP into vector illegal?

...ing with >> other >> vector types, and their scalar components. */ >> /* APPLE LOCAL 4505813 */ >> typedef long long __m64 __attribute__ ((__vector_size__ (8), >> __may_alias__)); > > This is actually completely different AFAIK, That statement was that: > float4 a; > float* ptr_z = (float*)(&a) + 3; ``violates strict aliasing`` That assertion is wrong. The docs says: @item may_alias Accesses to objects with types with this attribute are not subjected to type-based alias analysis, but are instead assumed to be able to alias any other type of obje...

[LLVMdev] Generalizing shuffle vector

2008 Sep 30

[LLVMdev] Generalizing shuffle vector

On Mon, Sep 29, 2008 at 8:11 PM, Mon Ping Wang <wangmp at apple.com> wrote: > The problem with generating insert and extracts is that we can generate poor > code > %tmp16 = extractelement <4 x float> %f4b, i32 0 > %f8a = insertelement <8 x float> %f8a, float %tmp16, i32 0 > %tmp18 = extractelement <4 x float> %f4b, i32 1 > %f8c

[LLVMdev] Failure to optimize vector select

2013 Aug 20

[LLVMdev] Failure to optimize vector select

On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote: > Can you send the IR of the function ? Attached is the -O0 and -O3 IR -------------- next part -------------- A non-text attachment was scrubbed... Name: vselect_optimized.ll Type: application/octet-stream Size: 1545 bytes Desc: not available URL:

search for: float4