thr3ads.net - similar to: "[LLVMdev] Functions: sret and readnone"

Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Functions: sret and readnone"

2009 Nov 05

[LLVMdev] Functions: sret and readnone

It's been a while and I finally had the time to look into this. What I did was to build a custom AliasAnalysis pass, as Chris suggested, that returns AliasAnalysis::Mod for values passed to the sample function in the sret spot, and NoModRef for all other values. I'm also returning AliasAnalysis::AccessesArguments in the pass' getModRefBehavior methods. However, I haven't been

[LLVMdev] Functions: sret and readnone

2009 Oct 05

[LLVMdev] Functions: sret and readnone

On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote: > Hi all, > > I'm currently building a DSL for a computer graphics project that is > not unlike NVIDIA's Cg. I have an intrinsic with the following > signature > > float4 sample(texture tex, float2 coords); > > that is translated to this LLVM IR code: > > declare void @"sample"(%float4* noalias

[LLVMdev] Functions: sret and readnone

2009 Oct 06

[LLVMdev] Functions: sret and readnone

On 5 Okt., 23:33, Dan Gohman <goh... at apple.com> wrote: > > Is there a reason it needs to be an array? A vector of four floats > wouldn't have this problem, if that's an option. > Unfortunately that's not an option. At the moment I'm restricting myself to the use of scalar code only, in order to be able to vectorize the code easily later (e.g., float4 as it is

[LLVMdev] Functions: sret and readnone

2009 Nov 06

[LLVMdev] Functions: sret and readnone

Hi Stephan, > intrinsic float4 sample(int tex, float2 tc); > > float4 main(int tex, float2 tc) > { > float4 x = sample(tex, tc); > return 0.0; > } without additional information it would be wrong to remove the call to sample because it might write to a global variable. > As you can see, the call to the sample function is still present, > although the actual value

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hello, FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324 When compiling the following code for sm_20, func params are by some reason given with .align 0, which is invalid. Problem does not occur if compiled for sm_10. > cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple =

[LLVMdev] Functions: sret and readnone

2009 Nov 06

[LLVMdev] Functions: sret and readnone

Duncan, thanks for your answer! > In order to perform this transform the optimizers would have to work out > that sample does not modify any global state. This cannot be done without > knowing the definition of sample, but you only provide a declaration. Which is why I am trying to supply this additional information in a custom alias analysis pass, but it doesn't seem to work. (The

[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)

2011 Oct 20

[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)

Hello, I am the developer of Clover, and so much activity about OpenCL these days is really exciting. Here is my point of view, mainly on Clover and how the projects could use each other. Clover is made in a way that allow a certain level of modularity. Although POCL would be very difficult to merge into Clover (or Clover into POCL), as these two projects are nearly exactly doing the same things

[LLVMdev] Vector instructions

2008 Jun 27

[LLVMdev] Vector instructions

On Jun 27, 2008, at 8:02 AM, Stefanus Du Toit wrote: >>>> <result> = shufflevector <a x <ty>> <v1>, <b x <ty>> <v2>, <d x >>>> i32> >>>> <mask> ; yields <d x <ty>> >>> >>> With the requirement that the entries in the (still constant) mask >>> are >>> within

[LLVMdev] How to vectorize a vector type cast?

2012 Feb 28

[LLVMdev] How to vectorize a vector type cast?

Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as typedef float float4 __attribute__((ext_vector_type(4))); typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); float4 to_float4(uchar4 in) { float4 out = {in.x, in.y, in.z, in.w}; return out; } Running

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Alex, From my experience in working with GPU vector registers; there is no support for swizzles in the manner that you would normally code them, and in my case I have 6^4 permutations on src registers and 24 combinations in the dst registers. The way that I ended up handling this was to have different register classes for 1, 2, 3 and 4 component vectors. This made the generic cases very simple

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in

[LLVMdev] About JIT by LLVM 2.9 or later

2011 Nov 02

[LLVMdev] About JIT by LLVM 2.9 or later

Hello guys, Thanks for your help when you are busing. I am working on an open source project. It supports shader language and I want JIT feature, so LLVM is used. But now I find the ABI & Calling Convention did not co-work with MSVC. For example, following code I have: struct float4 { float x, y, z, w; }; struct float4x4 { float4 x, y, z, w; }; float4 fetch_vs( float4x4* mat

[LLVMdev] sret on scalars

2010 Aug 19

[LLVMdev] sret on scalars

I am needing to return i128 as a shadow return due to abi issues on alpha. The problem I am running into is the code for doing that with scalars (currently only used for vectors, as far as I can tell) sets the sret on the parameter. If I just go this path, then I am setting sret on an integer pointer, which verify objects too. LangRef doesn't say scalars are allowed to have sret set, but

[LLVMdev] sret on scalars

2010 Aug 23

[LLVMdev] sret on scalars

On Aug 19, 2010, at 1:38 PM, Andrew Lenharth wrote: > I am needing to return i128 as a shadow return due to abi issues on > alpha. The problem I am running into is the code for doing that with > scalars (currently only used for vectors, as far as I can tell) sets > the sret on the parameter. If I just go this path, then I am setting > sret on an integer pointer, which verify

[LLVMdev] Win64 Calling Convention problem

2009 Dec 03

[LLVMdev] Win64 Calling Convention problem

Hi! I have discovered a problem with LLVM's interpretation of the Win64 calling convention w.r.t. passing of aggregates as arguments. The following code is part of my host application that is compiled with Visual Studio 2005 in 64-bit debug mode. noise4 expects a structure of four floats as its first and only argument, which is - in accordance with the specs of the Win64 calling convention -

SRET consistency between declaration and call site

2015 Oct 06

SRET consistency between declaration and call site

On Oct 6, 2015, at 4:33 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > On Tue, Oct 6, 2015 at 1:21 PM, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > Can you give an example of where it would trigger in LTO and when should > not? > > You could imagine that __muldc3 might be

[LLVMdev] Question on sret

2010 Jul 14

[LLVMdev] Question on sret

Dear All, What is the purpose of the sret function parameter attribute? Does it affect the calling convention used during code generation, is it a hint to optimizations, or is it used for something else? I'm currently working on automatic pool allocation; this transform clones a function and adds extra parameters to the beginning of the function's parameter list. This means that I

[LLVMdev] Functions: sret and readnone

2009 Nov 06

[LLVMdev] Functions: sret and readnone

Hi Stephan, >> In order to perform this transform the optimizers would have to work out >> that sample does not modify any global state. This cannot be done without >> knowing the definition of sample, but you only provide a declaration. > > Which is why I am trying to supply this additional information in a > custom alias analysis pass, but it doesn't seem to

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Dear all, I'm attaching a patch that should fix the issue mentioned above. It simply makes the same check seen in the same file for global variables: emitPTXAddressSpace(PTy->getAddressSpace(), O); if (GVar->getAlignment() == 0) O << " .align " << (int) TD->getPrefTypeAlignment(ETy); else O << " .align " <<

Allow CallSlot optimization for throwing functions for sret arguments

2018 Mar 05

Allow CallSlot optimization for throwing functions for sret arguments

Hi all, in Rust we have a bug report about about a missed optimization which one would expect CallSlot optimization to handle: https://github.com/rust-lang/rust/issues/48533 The IR looks like this: define void @bar(%S* noalias nocapture sret dereferenceable(16), void (%S*)* nocapture nonnull) unnamed_addr #0 { %3 = alloca %S, align 8 %4 = bitcast %S* %3 to i8* call void

similar to: [LLVMdev] Functions: sret and readnone