similar to: [LLVMdev] RE: LLVM extension v.s. DirectX Shaders

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] RE: LLVM extension v.s. DirectX Shaders"

2005 Dec 15
3
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
Dear all: To write a compiler for Microsoft Direct3D shaders from our hardware, I have a program which translates the Direct3D shader assembly to LLVM assembly. I added several intrinsics for this purpose. It's a vector ISA and has some special instructions like: * rcp (reciprocal) * frc (the fractional portion of each input component) * dp4 (dot product) * exp (exponential) * max, min These
2005 Dec 15
0
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
On Thu, 15 Dec 2005, Tzu-Chien Chiu wrote: > To write a compiler for Microsoft Direct3D shaders from our hardware, > I have a program which translates the Direct3D shader assembly to LLVM > assembly. I added several intrinsics for this purpose. > It's a vector ISA and has some special instructions like: > * rcp (reciprocal) > * frc (the fractional portion of each input
2013 Oct 04
1
[LLVMdev] ADDE to use branch registers
Hi, I am working on a LLVM backend that has eight different branch registers. I am having a lot of trouble with implementing the following instructions: addcg $r0.1, $b0.0 = $r0.1, $r0.1, $b0.0 (r is a general purpose register and b is a 1 bit branch register) The branch register is used for carry in and carry out. I have noticed that this instruction is very closely related to the ADDE
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
On Feb 13, 2009, at 9:47 AM, Alex wrote: > It seems to me that LLVM sub-register is not for the following > hardware architecture. > > All instructions of a hardware are vector instructions. All > registers contains > 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. > > Most instructions write more than one elements in this way: > > mul
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written
2002 Jan 30
1
Patch: update zlib/* to 1.1.3
This patch (apologies for the size) updates zlib/* to the files that ship with zlib 1.1.3. Index: zlib/ChangeLog =================================================================== RCS file: /cvsroot/rsync/zlib/ChangeLog,v retrieving revision 1.1 diff -u -r1.1 ChangeLog --- zlib/ChangeLog 7 May 1998 06:19:41 -0000 1.1 +++ zlib/ChangeLog 30 Jan 2002 01:12:41 -0000 @@ -1,6 +1,54 @@ ChangeLog
2016 Mar 29
0
IfConversion and representation of predicates
Hello, I have a few questions about applying the IfConversion pass to my out-of-tree target. (1) Is it true that the IfConversion pass may only run after register allocation? I often encounter this bad scenario, and I think it could be entirely avoided if IfConversion ran before register allocation: the block-to-be-predicated contains load-immediate (LI) instructions. The LI instructions
2002 Aug 29
0
PATCH: Fix IRIX 6 testsuite failures
Having built rsync 2.5.5 on IRIX 6.2 with gcc 3.1, I ran into two failures when running the testsuite with make check: both the chgrp and hardlinks tests fail: The failure in the chgrp test occurs here: + rsync -rtgvvv /amnt/callisto/volumes/obj-irix5/local/obj.irix5/rsync-2.5.5/testtmp.chgrp/from/ /amnt/callisto/volumes/obj-irix5/local/obj.irix5/rsync-2.5.5/testtmp.chgrp/to/ rsync: opendir
2003 Jun 20
0
[PATCH] Regression test portabilization.
Hi All. Attached is a patch (against OpenSSH Portable -current) to portablize the regression tests. It will also apply to OpenBSD's (with a couple of rejects). They are based on work by Roumen Petrov and myself, with contributions from Corinna Vinschen and David M Williams. My goal is to have the tests work out of the box on as many of our supported platforms as possible so running the
2003 Dec 12
0
proofreading corrections (cvs) (PR#5730)
Here is a patch of changes from the proof-reading of the R reference manual, made against the current cvs. regards -- Brian Gough Network Theory Ltd -- Publishing Free Software Manuals 15 Royal Park Bristol BS8 3AL United Kingdom Tel: +44 (0)117 3179309 Fax: +44 (0)117 9048108 Web: http://www.network-theory.co.uk/ Index: src/library/base/man/Arithmetic.Rd
2017 Mar 30
2
InstructionSimplify: adding a hook for shufflevector instructions
As Sanjay noted in D31426<https://reviews.llvm.org/D31426#712701>, InstructionSimplify is missing the following simplification: This function: define <4 x i32> @splat_operand(<4 x i32> %x) { %splat = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> zeroinitializer %shuf = shufflevector <4 x i32> %splat, <4 x i32> undef, <4 x i32>
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
Hi Mon Ping, Generalizing shufflevector would be great. I have an additional suggestion below. On 29-Sep-08, at 11:11 PM, Mon Ping Wang wrote: > I am proposing to extend the shuffle vector definition to be > <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> > <mask> ; yields <m x <ty>> > > The
2019 Nov 27
2
LangRef semantics for shufflevector with undef mask is incorrect
Ok, makes sense. My suggestion is that we patch the IR Verifier to ensure that the mask is indeed a vector of constants and/or undefs. Right now it only runs the standard checks for instructions. We will also run Alive2 on the test suite to make sure undef is never replaced in practice. Thanks, Nuno -----Original Message----- From: Eli Friedman <efriedma at quicinc.com> Sent: 27 de
2017 Mar 14
3
llvm-stress crash
Hi, Using llvm-stress, I got a crash after Post-RA pseudo expansion, with machine verifier. A 128 bit register %vreg233:subreg_l32<def,read-undef> = LLCRMux %vreg119; GR128Bit:%vreg233 GRX32Bit:%vreg119 gets spilled: %vreg265:subreg_l32<def,read-undef> = LLCRMux %vreg119; GR128Bit:%vreg265 GRX32Bit:%vreg119 ST128 %vreg265, <fi#10>, 0, %noreg;
2019 Dec 09
2
[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef,
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
Hi, I agree that the more general shufflevector is more useful. I narrowed the original proposal a little bit because of the concern for the implementation cost. However, the slightly narrowed definition will probably require falling backing to generate insert and extracts for complex masks so it is possible that there will be no extra cost in supporting the more general definition.
2008 Sep 30
2
[LLVMdev] Generalizing shuffle vector
I agree further generalization seems like a very good idea. But I'd like to see what Mon Ping proposed implemented first so we have a better idea of the implementation cost. Thanks, Evan On Sep 30, 2008, at 6:44 AM, Stefanus Du Toit wrote: > Hi Mon Ping, > > Generalizing shufflevector would be great. I have an additional > suggestion below. > > On 29-Sep-08, at 11:11
2019 Nov 27
3
LangRef semantics for shufflevector with undef mask is incorrect
On 11/27/19 2:10 AM, Eli Friedman via llvm-dev wrote: The shuffle mask of a shufflevector is special: it's required to be a constant in a specific form. From LangRef: "The shuffle mask operand is required to be a constant vector with either constant integer or undef values." So really, we can resolve this any way we want; "undef" in this context doesn't have to mean
2003 Dec 09
1
documentation fixes (cvs) (PR#5632)
The patch below attempts to correct some unclear sentences in the R documentation. In the case of coplot.Rd it wasn't clear whether "shingle" bar had a special meaning or was a typo for "single". I've just put a comment in that case. regards -- Brian Gough Network Theory Ltd -- Publishing Free Software Manuals 15 Royal Park Bristol BS8 3AL United Kingdom Tel: +44
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Alex, From my experience in working with GPU vector registers; there is no support for swizzles in the manner that you would normally code them, and in my case I have 6^4 permutations on src registers and 24 combinations in the dst registers. The way that I ended up handling this was to have different register classes for 1, 2, 3 and 4 component vectors. This made the generic cases very simple