Displaying 20 results from an estimated 57 matches for "float4".
Did you mean:
float
2009 Dec 03
4
[LLVMdev] Win64 Calling Convention problem
...he
following code is part of my host application that is compiled with
Visual Studio 2005 in 64-bit debug mode. noise4 expects a structure of
four floats as its first and only argument, which is - in accordance
with the specs of the Win64 calling convention - passed by pointer.
--- snip ---
struct float4 { float x, y, z, w; }
float noise4(float4 v)
{
0000000140067AE0 mov qword ptr [rsp+8],rcx
0000000140067AE5 push rdi
0000000140067AE6 sub rsp,10h
0000000140067AEA mov rdi,rsp
0000000140067AED mov rcx,4
0000000140067AF7 mov eax,0CCCCCCCCh
00000001...
2009 Oct 05
5
[LLVMdev] Functions: sret and readnone
Hi all,
I'm currently building a DSL for a computer graphics project that is
not unlike NVIDIA's Cg. I have an intrinsic with the following
signature
float4 sample(texture tex, float2 coords);
that is translated to this LLVM IR code:
declare void @"sample"(%float4* noalias nocapture sret, %texture,
$float2) nounwind readnone
The type float4 is basically an array of four floats, which cannot be
returned directly on an x86 using the traditio...
2013 Aug 20
2
[LLVMdev] Failure to optimize vector select
Hi,
I've found a case I would expect would optimize easily, but it doesn't. A simple implementation of vector select:
float4 simple_select(float4 a, float4 b, int4 c)
{
float4 result;
result.x = c.x ? a.x : b.x;
result.y = c.y ? a.y : b.y;
result.z = c.z ? a.z : b.z;
result.w = c.w ? a.w : b.w;
return result;
}
I would expect this would be optimized to
%bool = icmp eq <4 x i32> %c, 0
%r...
2011 Nov 02
5
[LLVMdev] About JIT by LLVM 2.9 or later
Hello guys,
Thanks for your help when you are busing.
I am working on an open source project. It supports shader language
and I want JIT feature, so LLVM is used.
But now I find the ABI & Calling Convention did not co-work with MSVC.
For example, following code I have:
struct float4 { float x, y, z, w; };
struct float4x4 { float4 x, y, z, w; };
float4 fetch_vs( float4x4* mat ){ return mat->y; }
Caller:
// ...
float4x4 mat; // Initialized
float4 ret = fetch(mat); // fetch is JITed by LLVM
float4 ret_vs = fetch_vs(mat)
// ...
Callee(LLVM):
%vec4 = type { float, float, fl...
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as
typedef float float4 __attribute__((ext_vector_type(4)));
typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));
float4 to_float4(uchar4 in)
{
float4 out = {in.x, in.y, in.z, in.w...
2008 Oct 14
4
[LLVMdev] Making GEP into vector illegal?
In Joe programmer language (i.e. C ;) ), are we basically talking
about disallowing:
float4 a;
float* ptr_z = &a.z;
?
Won't programmers just resort to:
float4 a;
float* ptr_z = (float*)(&a) + 3;
?
On Oct 14, 2008, at 3:55 PM, Mon Ping Wang wrote:
> Hi,
>
> Something like a sequential type makes sense especially in light of
> what Duncan is point out. I agr...
2009 Oct 06
2
[LLVMdev] Functions: sret and readnone
...reason it needs to be an array? A vector of four floats
> wouldn't have this problem, if that's an option.
>
Unfortunately that's not an option. At the moment I'm restricting
myself to the use of scalar code only, in order to be able to
vectorize the code easily later (e.g., float4 as it is now will then
become an array of four vectors for parallel processing of n (probably
4, SSE) pixels). But thanks for coming up with this idea!
Chris, I'll take a look at the AliasAnalysis functionality. Depending
on how much effort it is to implement a solution I might follow this
app...
2009 Nov 05
0
[LLVMdev] Functions: sret and readnone
...iasAnalysis::AccessesArguments in the pass'
getModRefBehavior methods. However, I haven't been successful with
this approach and hope that someone has an idea on how to fix this.
Here's a step by step illustration of the problem:
1. The following source code is compiled ...
intrinsic float4 sample(int tex, float2 tc);
float4 main(int tex, float2 tc)
{
float4 x = sample(tex, tc);
return 0.0;
}
2. ... into the following LLVM code (after a bunch of optimizations
have run):
define void @"main$int$float2"([4 x float]* noalias nocapture sret,
i32, [2 x float]) nounwind {
%5...
2008 Oct 14
0
[LLVMdev] Making GEP into vector illegal?
On Tue, Oct 14, 2008 at 1:34 PM, Daniel M Gessel <gessel at apple.com> wrote:
> In Joe programmer language (i.e. C ;) ), are we basically talking
> about disallowing:
>
> float4 a;
> float* ptr_z = &a.z;
>
> ?
That's my reading as well; the argument for not allowing it is just to
make optimization easier. We don't allow addressing individual bits
either, and compilers obviously have to work around that for
bitfields.
AFAIK, both clang and gcc curre...
2010 Mar 25
0
[LLVMdev] Resizing vector values
...oduce a function operating on concrete vectors. For
example, given vadd function operating on magic 17-element vectors:
typedef float vfloat __attribute__((ext_vector_type(17)));
vfloat vadd(vfloat a, vfloat b) { return a+b; }
it should produce vadd operating on 4-element vectors:
typedef float float4 __attribute__((ext_vector_type(4)));
float4 vadd(float4 a, float4 b) { return a+b; }
In other words, I only want to change the type from vfloat to float4 for
arguments, result, instructions and operands (aka 'values'). Unfortunately,
the type of an llvm::Value appears to be immutable?
It...
2009 Oct 05
0
[LLVMdev] Functions: sret and readnone
On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote:
> Hi all,
>
> I'm currently building a DSL for a computer graphics project that is
> not unlike NVIDIA's Cg. I have an intrinsic with the following
> signature
>
> float4 sample(texture tex, float2 coords);
>
> that is translated to this LLVM IR code:
>
> declare void @"sample"(%float4* noalias nocapture sret, %texture,
> $float2) nounwind readnone
>
> The type float4 is basically an array of four floats, which cannot be
> returned...
2013 Aug 20
0
[LLVMdev] Failure to optimize vector select
...ed running SLP vectorizer pass (-vectorize-slp)?
Eugene
On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <arsenm2 at gmail.com> wrote:
> Hi,
>
> I've found a case I would expect would optimize easily, but it doesn't. A
> simple implementation of vector select:
>
> float4 simple_select(float4 a, float4 b, int4 c)
> {
> float4 result;
>
> result.x = c.x ? a.x : b.x;
> result.y = c.y ? a.y : b.y;
> result.z = c.z ? a.z : b.z;
> result.w = c.w ? a.w : b.w;
>
> return result;
> }
>
> I would expect this would b...
2013 Mar 20
2
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
..._ __attribute__((address_space(2)))
^
/opt/cuda/include/host_defines.h:183:9: note: previous definition is here
#define __constant__ \
^
1 warning generated.
Another question is
What about extern __shared__ ?
I can see that the error goes away if I replace "extern __shared__ float4
sharedPos[]" with "__shared__ float4* sharedPos;". Do I have to dynamically
allocate the shared memory by specifying size in kernel Launch? If so, why
doesn't the second use of the same statement in another function cause the
error ?
I am using 3.2.
--
View this message i...
2008 Sep 30
4
[LLVMdev] Generalizing shuffle vector
...neration.
Today, building a vector for different lengths requires a sequence of
extracting each element in the vector and inserting each element into
a new vector. With this new form, it is more straightforward to write
and reason about
typedef __attribute__(( ext_vector_type(4) )) float float4;
typedef __attribute__(( ext_vector_type(8) )) float float8;
float8 f8;
float4 f4a, f4b, f4c;
f4a = f8.hi;
f8.hi = f4b; f8.lo = f4c;
where hi and lo represent the high half and low half of the vector.
The outgoing IR is
%f4a = shufflevector <8xf32>%f8, undef, <4xi32> <0, 1, 2,...
2011 Nov 02
0
[LLVMdev] About JIT by LLVM 2.9 or later
空明流转 <wuye9036 at gmail.com> writes:
> Could I wrap LLVM with mingw and expose some C api to called by MSVC?
>
> And in mingw, I will override the signature float4 foo( float44 ) to
> float4* foo( float4*, float44* ); ?
>
> Is that OK?
If you pass and return the structs through pointers, you don't need
MinGW in the middle, you can use MSVC directly. Please note that using
MinGW will not fix the LLVM sret problem.
2011 Mar 29
2
[LLVMdev] [GSoC] "Microsoft Direct3D shader bytecode backend" proposal
...ate the struct members with metadata
I'm leaning towards the latter one.
>
> Perhaps a simple example would be nice, showing a very simple LLVM IR
> input and the (proposed) bytecode output.
How about this (this is from
http://www.neatware.com/lbstudio/web/hlsl.html)?
struct a2v {
float4 position : POSITION;
};
struct v2p {
float4 position : POSITION;
};
void main(in a2v IN, out v2p OUT, uniform float4x4 ModelViewMatrix) {
OUT.position = mul(IN.position, ModelViewMatrix);
}
This would generate something like this (assuming I took the metadata
route):
%struct.a2v = { <4 x...
2013 Aug 20
0
[LLVMdev] Failure to optimize vector select
Hi Matt,
This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it.
Thanks,
Nadav
On Aug 20...
2008 Oct 15
3
[LLVMdev] Making GEP into vector illegal?
...ing with
>> other
>> vector types, and their scalar components. */
>> /* APPLE LOCAL 4505813 */
>> typedef long long __m64 __attribute__ ((__vector_size__ (8),
>> __may_alias__));
>
> This is actually completely different AFAIK,
That statement was that:
> float4 a;
> float* ptr_z = (float*)(&a) + 3;
``violates strict aliasing``
That assertion is wrong. The docs says:
@item may_alias
Accesses to objects with types with this attribute are not subjected to
type-based alias analysis, but are instead assumed to be able to alias
any other type of obje...
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
On Mon, Sep 29, 2008 at 8:11 PM, Mon Ping Wang <wangmp at apple.com> wrote:
> The problem with generating insert and extracts is that we can generate poor
> code
> %tmp16 = extractelement <4 x float> %f4b, i32 0
> %f8a = insertelement <8 x float> %f8a, float %tmp16, i32 0
> %tmp18 = extractelement <4 x float> %f4b, i32 1
> %f8c
2013 Aug 20
3
[LLVMdev] Failure to optimize vector select
On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote:
> Can you send the IR of the function ?
Attached is the -O0 and -O3 IR
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vselect_optimized.ll
Type: application/octet-stream
Size: 1545 bytes
Desc: not available
URL: