Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Functions: sret and readnone"
2009 Nov 05
0
[LLVMdev] Functions: sret and readnone
It's been a while and I finally had the time to look into this.
What I did was to build a custom AliasAnalysis pass, as Chris
suggested, that returns AliasAnalysis::Mod for values passed to the
sample function in the sret spot, and NoModRef for all other values.
I'm also returning AliasAnalysis::AccessesArguments in the pass'
getModRefBehavior methods. However, I haven't been
2009 Oct 05
0
[LLVMdev] Functions: sret and readnone
On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote:
> Hi all,
>
> I'm currently building a DSL for a computer graphics project that is
> not unlike NVIDIA's Cg. I have an intrinsic with the following
> signature
>
> float4 sample(texture tex, float2 coords);
>
> that is translated to this LLVM IR code:
>
> declare void @"sample"(%float4* noalias
2009 Oct 06
2
[LLVMdev] Functions: sret and readnone
On 5 Okt., 23:33, Dan Gohman <goh... at apple.com> wrote:
>
> Is there a reason it needs to be an array? A vector of four floats
> wouldn't have this problem, if that's an option.
>
Unfortunately that's not an option. At the moment I'm restricting
myself to the use of scalar code only, in order to be able to
vectorize the code easily later (e.g., float4 as it is
2009 Nov 06
2
[LLVMdev] Functions: sret and readnone
Hi Stephan,
> intrinsic float4 sample(int tex, float2 tc);
>
> float4 main(int tex, float2 tc)
> {
> float4 x = sample(tex, tc);
> return 0.0;
> }
without additional information it would be wrong to remove the call to
sample because it might write to a global variable.
> As you can see, the call to the sample function is still present,
> although the actual value
2012 Jul 11
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hello,
FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324
When compiling the following code for sm_20, func params are by some reason
given with .align 0, which is invalid. Problem does not occur if compiled
for sm_10.
> cat test.ll
; ModuleID = '__kernelgen_main_module'
target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple =
2009 Nov 06
0
[LLVMdev] Functions: sret and readnone
Duncan, thanks for your answer!
> In order to perform this transform the optimizers would have to work out
> that sample does not modify any global state. This cannot be done without
> knowing the definition of sample, but you only provide a declaration.
Which is why I am trying to supply this additional information in a
custom alias analysis pass, but it doesn't seem to work. (The
2011 Oct 20
0
[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)
Hello,
I am the developer of Clover, and so much activity about OpenCL these days is really exciting. Here is my point of view, mainly on Clover and how the projects could use each other.
Clover is made in a way that allow a certain level of modularity. Although POCL would be very difficult to merge into Clover (or Clover into POCL), as these two projects are nearly exactly doing the same things
2008 Jun 27
0
[LLVMdev] Vector instructions
On Jun 27, 2008, at 8:02 AM, Stefanus Du Toit wrote:
>>>> <result> = shufflevector <a x <ty>> <v1>, <b x <ty>> <v2>, <d x
>>>> i32>
>>>> <mask> ; yields <d x <ty>>
>>>
>>> With the requirement that the entries in the (still constant) mask
>>> are
>>> within
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as
typedef float float4 __attribute__((ext_vector_type(4)));
typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));
float4 to_float4(uchar4 in)
{
float4 out = {in.x, in.y, in.z, in.w};
return out;
}
Running
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Alex,
From my experience in working with GPU vector registers; there is no
support for swizzles in the manner that you would normally code them,
and in my case I have 6^4 permutations on src registers and 24
combinations in the dst registers. The way that I ended up handling this
was to have different register classes for 1, 2, 3 and 4 component
vectors. This made the generic cases very simple
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote:
>
> Well, how many possible permutations are there? Is it possible to
> model each case as a separate physical register?
>
> Evan
>
I don't think so. There are 4x4x4x4 = 256 permutations. For example:
* xyzw: default
* zxyw
* yyyy: splat
Even if can model each of these 256 cases as a separate physical register,
how can I model the use of r0.xyzw in
2011 Nov 02
5
[LLVMdev] About JIT by LLVM 2.9 or later
Hello guys,
Thanks for your help when you are busing.
I am working on an open source project. It supports shader language
and I want JIT feature, so LLVM is used.
But now I find the ABI & Calling Convention did not co-work with MSVC.
For example, following code I have:
struct float4 { float x, y, z, w; };
struct float4x4 { float4 x, y, z, w; };
float4 fetch_vs( float4x4* mat
2010 Aug 19
2
[LLVMdev] sret on scalars
I am needing to return i128 as a shadow return due to abi issues on
alpha. The problem I am running into is the code for doing that with
scalars (currently only used for vectors, as far as I can tell) sets
the sret on the parameter. If I just go this path, then I am setting
sret on an integer pointer, which verify objects too. LangRef doesn't
say scalars are allowed to have sret set, but
2010 Aug 23
0
[LLVMdev] sret on scalars
On Aug 19, 2010, at 1:38 PM, Andrew Lenharth wrote:
> I am needing to return i128 as a shadow return due to abi issues on
> alpha. The problem I am running into is the code for doing that with
> scalars (currently only used for vectors, as far as I can tell) sets
> the sret on the parameter. If I just go this path, then I am setting
> sret on an integer pointer, which verify
2009 Dec 03
4
[LLVMdev] Win64 Calling Convention problem
Hi!
I have discovered a problem with LLVM's interpretation of the Win64
calling convention w.r.t. passing of aggregates as arguments. The
following code is part of my host application that is compiled with
Visual Studio 2005 in 64-bit debug mode. noise4 expects a structure of
four floats as its first and only argument, which is - in accordance
with the specs of the Win64 calling convention -
2015 Oct 06
2
SRET consistency between declaration and call site
On Oct 6, 2015, at 4:33 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> On Tue, Oct 6, 2015 at 1:21 PM, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Can you give an example of where it would trigger in LTO and when should
> not?
>
> You could imagine that __muldc3 might be
2010 Jul 14
2
[LLVMdev] Question on sret
Dear All,
What is the purpose of the sret function parameter attribute? Does it
affect the calling convention used during code generation, is it a hint
to optimizations, or is it used for something else?
I'm currently working on automatic pool allocation; this transform
clones a function and adds extra parameters to the beginning of the
function's parameter list. This means that I
2009 Nov 06
1
[LLVMdev] Functions: sret and readnone
Hi Stephan,
>> In order to perform this transform the optimizers would have to work out
>> that sample does not modify any global state. This cannot be done without
>> knowing the definition of sample, but you only provide a declaration.
>
> Which is why I am trying to supply this additional information in a
> custom alias analysis pass, but it doesn't seem to
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Dear all,
I'm attaching a patch that should fix the issue mentioned above. It
simply makes the same check seen in the same file for global
variables:
emitPTXAddressSpace(PTy->getAddressSpace(), O);
if (GVar->getAlignment() == 0)
O << " .align " << (int) TD->getPrefTypeAlignment(ETy);
else
O << " .align " <<
2018 Mar 05
1
Allow CallSlot optimization for throwing functions for sret arguments
Hi all,
in Rust we have a bug report about about a missed optimization which
one would expect CallSlot optimization to handle:
https://github.com/rust-lang/rust/issues/48533
The IR looks like this:
define void @bar(%S* noalias nocapture sret dereferenceable(16), void
(%S*)* nocapture nonnull) unnamed_addr #0 {
%3 = alloca %S, align 8
%4 = bitcast %S* %3 to i8*
call void