Displaying 20 results from an estimated 500 matches similar to: "[LLVMdev] change type allocoted register"
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Dear all,
I'm attaching a patch that should fix the issue mentioned above. It
simply makes the same check seen in the same file for global
variables:
emitPTXAddressSpace(PTy->getAddressSpace(), O);
if (GVar->getAlignment() == 0)
O << " .align " << (int) TD->getPrefTypeAlignment(ETy);
else
O << " .align " <<
2012 Jul 11
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hello,
FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324
When compiling the following code for sm_20, func params are by some reason
given with .align 0, which is invalid. Problem does not occur if compiled
for sm_10.
> cat test.ll
; ModuleID = '__kernelgen_main_module'
target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple =
2011 Oct 20
0
[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)
Hello,
I am the developer of Clover, and so much activity about OpenCL these days is really exciting. Here is my point of view, mainly on Clover and how the projects could use each other.
Clover is made in a way that allow a certain level of modularity. Although POCL would be very difficult to merge into Clover (or Clover into POCL), as these two projects are nearly exactly doing the same things
2009 Nov 05
0
[LLVMdev] Functions: sret and readnone
It's been a while and I finally had the time to look into this.
What I did was to build a custom AliasAnalysis pass, as Chris
suggested, that returns AliasAnalysis::Mod for values passed to the
sample function in the sret spot, and NoModRef for all other values.
I'm also returning AliasAnalysis::AccessesArguments in the pass'
getModRefBehavior methods. However, I haven't been
2012 Dec 13
2
[LLVMdev] Cross compile LLVM
Hi;
I am trying to cross compile LLVM for Android-NDK;
I am using CMake as a build system; hence, i have defined CMAKE_SYSTEM_NAME
variable so to turn on CMAKE_CROSSCOMPILING flag, being used by LLVM.
As far as i understand from the LLVM structure; when cross compiling,
executable are generated with two versions; one goes under the target build
directory ${CMAKE_BINARY_DIR} and the other goes
2018 Apr 12
2
How to specify the RegisterClass of an IMPLICIT_DEF?
Hi,
I'm implementing the built_vector as an IMPLICIT_DEF followed by INSERT_SUBREGs. This approach is the one of the SPARC architecture.
def : Pat<(build_vector (f32 fpimm:$a1), (f32 fpimm:$a2)),
(INSERT_SUBREG(INSERT_SUBREG (v2f32 (IMPLICIT_DEF)),
(i32 (COPY_TO_REGCLASS (MOVSUTO_A_iSLo (bitcast_fpimm_to_i32 f32:$a1)), FPUaOffsetClass)), A_UNIT_PART),
2009 Apr 15
0
[LLVMdev] Tablegen question
If I force it to use v2f32 for my register class, it still fails with:
d:\hq\main\sw\appeng\tools\hpc\opencl\compiler\llvm\test\AMDIL>TableGen.
exe -gen
-dag-isel -I../../include/ test.td > output
GPRV2F32:v2f32:$src1 MACRO_DISTANCE_FAST_v2f32: (set
GPRF32:f32:$dst, (i
ntrinsic_w_chain:f32 84:iPTR, GPRV2F32:v2f32:$src0,
GPRV2F32:v2f32:$src1))
TableGen.exe: In
2012 Apr 24
2
[LLVMdev] RFC: ErLLVM - Implemented HiPE Calling Convention
This patch (and the others that will follow) are rebased on svn r155440:
"AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8
immediate. We can't use it here because the shuffle code does not check that
the lower part of the word is identical to the upper part"
Patch 1/3:
The attached commits add a new calling convention to support the LLVM backend
for
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Alex,
From my experience in working with GPU vector registers; there is no
support for swizzles in the manner that you would normally code them,
and in my case I have 6^4 permutations on src registers and 24
combinations in the dst registers. The way that I ended up handling this
was to have different register classes for 1, 2, 3 and 4 component
vectors. This made the generic cases very simple
2009 Apr 15
1
[LLVMdev] Tablegen question
On Apr 15, 2009, at 1:11 PM, Villmow, Micah wrote:
> If I force it to use v2f32 for my register class, it still fails with:
> d:\hq\main\sw\appeng\tools\hpc\opencl\compiler\llvm\test
> \AMDIL>TableGen.
> exe -gen
> -dag-isel -I../../include/ test.td > output
> GPRV2F32:v2f32:$src1 MACRO_DISTANCE_FAST_v2f32: (set
> GPRF32:f32:$dst, (i
> ntrinsic_w_chain:f32
2012 Feb 27
2
[LLVMdev] Alias in LLVM 3.0
We use alias extensively in our library to support OpenCL generating code for both our CPUs and GPUs. During the transition to LLVM 3.0 with the new type system, we're seeing two problems. Both involve type conversions occurring across an alias.
In one case, one of the types is pointer to an opaque type, and ends up creating an assert in the verifier where it is checking that argument types
2018 Jun 20
2
Node deletion during DAG Combination ?
Hi,
I'm trying to optimize the 'extract_vector_elt' for my SIMD microcontroller.
The idea is, during DAG combination, to merge load/extract sequence into an architecture specific node.
During Instruction Selection, this specific node will be target selected to an architecture specific instruction.
By 'combination of DAG nodes' I understand 'replacing a set of DAG nodes by
2009 Oct 05
5
[LLVMdev] Functions: sret and readnone
Hi all,
I'm currently building a DSL for a computer graphics project that is
not unlike NVIDIA's Cg. I have an intrinsic with the following
signature
float4 sample(texture tex, float2 coords);
that is translated to this LLVM IR code:
declare void @"sample"(%float4* noalias nocapture sret, %texture,
$float2) nounwind readnone
The type float4 is basically an array of four
2009 May 19
1
[LLVMdev] TableGen pattern
Hello,
I am trying to convert the subtree (vector_shuffle v2f32, v2f32
(build_vector imm1, imm2)) to a machine instruction that takes 2
v2f32's and 2 immediates. I tried the following table gen pattern :
(set v2f32Reg:$dst, (vector_shuffle v2f32Reg:$src1, v2f32Reg:$src2,
(build_vector
imm:$c1, imm:$c2)))
Table gen barfs about type
2009 Apr 15
2
[LLVMdev] Tablegen question
On Apr 15, 2009, at 11:15 AM, Villmow, Micah wrote:
> I still think there is a bug somewhere, but not sure where yet.
> This is what is generated in intrinsic.gen:
> case Intrinsic::opencl_math_fdistance: //
> llvm.opencl.math.fdistance
> ResultTy = Type::FloatTy;
> ArgTys.push_back(Tys[0]);
> ArgTys.push_back(Tys[0]);
> break;
OK. That looks right to me.
2010 Jul 05
0
[LLVMdev] Vector promotions for calling conventions
The X86-64 calling convention (annoyingly) specifies that "struct x { float a,b,c,d; }" is passed or returned in the low 2 elements of two separate XMM registers. For example, returning that would return "a,b" in the low elements of XMM0 and "c,d" in the low elements of XMM1. Both llvm-gcc and clang currently generate atrocious IR for these structs, which you can
2007 Apr 23
3
[LLVMdev] Instruction pattern type inference problem
On Sun, 22 Apr 2007, Christopher Lamb wrote:
> 1. Is there a good reason that v2f32 types are excluded from the
> isFloatingPoint filter? Looks like a bug to me.
>
> v2f32 = 22, // 2 x f32
> v4f32 = 23, // 4 x f32 <== start ??
> v2f64 = 24, // 2 x f64 <== end
>
> static inline bool isFloatingPoint(ValueType VT) {
2009 Aug 24
2
create list entry from variable
Hi;
assume i<-10
how can i create a list having key=10 and value=11
list(i=11) generates a list with
'i'
[1] 11
and not
10
[1] 11
any help?
Thanks
_________________________________________________________________
Facebook.
:ON:WL:en-US:SI_SB_facebook:082009
[[alternative HTML version deleted]]
2009 Oct 05
0
[LLVMdev] Functions: sret and readnone
On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote:
> Hi all,
>
> I'm currently building a DSL for a computer graphics project that is
> not unlike NVIDIA's Cg. I have an intrinsic with the following
> signature
>
> float4 sample(texture tex, float2 coords);
>
> that is translated to this LLVM IR code:
>
> declare void @"sample"(%float4* noalias
2012 Mar 02
1
[LLVMdev] vector shuffle emulation/expand in backend?
I'm having some troubles implementing vector support to our custom backend
It seems that llvm cannot emulate shuffle with extracts, inserts and builds?
I've enabled vector registers with
addRegisterClass(MVT::v2i32, TCE::V2I32RegsRegisterClass);
addRegisterClass(MVT::v2f32, TCE::V2F32RegsRegisterClass);
and created patterns for most vector instructions, including insert,
extract and