similar to: [LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

Displaying 20 results from an estimated 400 matches similar to: "[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)"

2015 Nov 18
1
[Mesa-dev] llvm TGSI backend (WIP) questions
Hi, On 13-11-15 19:51, Tom Stellard wrote: > On Fri, Nov 13, 2015 at 02:46:52PM +0100, Hans de Goede wrote: >> Hi All, >> >> So as discussed I've started working on a TGSI backend for >> llvm to use as a way to get compute going on nouveau (and other gpu-s). >> >> I'm still learning all the ins and outs of llvm so I do not have >> much to show
2015 Nov 13
6
llvm TGSI backend (WIP) questions
Hi All, So as discussed I've started working on a TGSI backend for llvm to use as a way to get compute going on nouveau (and other gpu-s). I'm still learning all the ins and outs of llvm so I do not have much to show yet. I've rebased Francisco's (curro's) latest version on top of llvm trunk, and added a commit on top to actual get it build with the latest trunk. So
2005 Sep 17
1
[LLVMdev] Subword register allocation
Hi, I have a question about implementing subword register allocation problems (see the REFERENCES in the end of this message) on LLVM. I have algorithms, but don't know the best way to implement them in LLVM. I asked similar question before: http://lists.cs.uiuc.edu/pipermail/llvmdev/2005- May/004001.html Because I still don't have a satisfying solution now, I try to elaborate it
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, While doing the performance measurement on a Ivy Bridge, I ran into compile time errors. I saw a bunch of “cannot select" in the LLVM test suite with -march=core-avx-i. E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3 -march=core-avx-i with: fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
I've run the SingleSource test suite for core-avx-i and have no failures here so a preprocessed file + commandline would be very useful if this reproduces for you still. On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > I'm having trouble reproducing this. I'm trying to get LNT to actually > run, but manually compiling the given source
2009 Feb 13
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
On Feb 13, 2009, at 9:47 AM, Alex wrote: > It seems to me that LLVM sub-register is not for the following > hardware architecture. > > All instructions of a hardware are vector instructions. All > registers contains > 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. > > Most instructions write more than one elements in this way: > > mul
2005 May 06
3
[LLVMdev] avoid live range overlap of "vector" registers
a "vector" register r0 is composed of four 32-bit floating scalar registers, r0.x, r0.y, r0.z, r0.w. each scalar reg can be assigned individually, e.g. mov r0.x, r1.y add r0.y, r1,x, r2.z or assigned simultaneously with vector instructions, e.g. add r0.xyzw, r1.xzyw, r2.xyzw My question is how to define the register in .td file to avoid the code generator overlaps the
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 7, 2014, at 8:49 PM, Quentin Colombet <qcolombet at apple.com> wrote: > > Sure, > > Here is the command line: > clang -cc1 -triple x86_64-apple-macosx -S -disable-free -disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core-avx-i -O3 -ferror-limit 19 -fmessage-length 114
2005 May 10
0
[LLVMdev] avoid live range overlap of "vector" registers
On Fri, 6 May 2005, Tzu-Chien Chiu wrote: > a "vector" register r0 is composed of four 32-bit floating scalar > registers, r0.x, r0.y, r0.z, r0.w. > > each scalar reg can be assigned individually, e.g. > > mov r0.x, r1.y > add r0.y, r1,x, r2.z > > or assigned simultaneously with vector instructions, e.g. > > add r0.xyzw, r1.xzyw, r2.xyzw > > My
2005 Dec 15
3
[LLVMdev] Vector LLVM extension v.s. DirectX Shaders
Dear all: To write a compiler for Microsoft Direct3D shaders from our hardware, I have a program which translates the Direct3D shader assembly to LLVM assembly. I added several intrinsics for this purpose. It's a vector ISA and has some special instructions like: * rcp (reciprocal) * frc (the fractional portion of each input component) * dp4 (dot product) * exp (exponential) * max, min These
2017 Feb 15
4
multiprecision add/sub
I suggest that LLVM needs intrinsics for add/sub with carry, e.g. declare {T, i1} @llvm.addc.T(T %a, T %b, i1 c) The current multiprecision clang intrinsics example: void foo(unsigned *x, unsigned *y, unsigned *z) { unsigned carryin = 0; unsigned carryout; z[0] = __builtin_addc(x[0], y[0], carryin, &carryout); carryin = carryout; z[1] = __builtin_addc(x[1], y[1],
2017 Sep 28
1
rename multiple files by file.rename or other functions
Hi John, Thanks to Jim for pointing out the file.rename() function. You can try this: # define the filename templates strIn <- "XYZW--Genesis_ABC.mp3" strOut <- "01Gen--.mp3" # create the strings "01", "02", ..., "50" v <- sapply(1:50, function(i) sprintf("%02d",i) ) # perform all the file renames for ( s in v ) {
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, Thanks for fixing the problem with the insertps mask. Generally the new shuffle lowering looks promising, however there are some cases where the codegen is now worse causing runtime performance regressions in some of our internal codebase. You have already mentioned how the new shuffle lowering is missing some features; for example, you explicitly said that we currently lack of
2010 May 14
2
[LLVMdev] vector optimization
Hi! Is there a pass that optimizes vector operations? If I have for examle a sequence of shufflevector instructions that optimizes them? (in opencl notation e.g. a.xyzw.wzyx.xxxx -> a.wwww) -Jochen
2007 Nov 13
3
cronbach's alpha
hie 1...i'm trying to carryout a relibility testusing cronbach's alpha what fuctin do i use. 2.. this is more of a statistical question.if the alpha value for all the variables is negative what does it mean. and if the alpha value is negative for all tyha variables but is greater than 0.7 for some sections of the variables what does that mean thanks in advance
2012 Oct 24
3
[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
Hi, I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below. The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is : // BEFORE LOOP ... Some COPYs.... 400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2
2015 Dec 22
0
Translating tests/trivial/compute.c gallium tests to opencl (input / help wanted)
Hi All, I've been working on translating the tests/trivial/compute.c tests to opencl (for the buffer setup and kernel launch, I'm keeping the compute kernels in tgsi as an intermediate step). I've got the test_input_global() test working, see: https://fedorapeople.org/~jwrdegoede/compute-opencl-tgsi.c Next I wanted to convert the test_system_values() test and there I've gotten
2011 Jul 01
2
[LLVMdev] (no subject)
I'm trying to debug a problem with our custom backend with using a tiered register allocation setup. Just a little background. My target uses vec4 32bit registers and I want to have three levels of registers setup. Each vec4 register can have two sub-regs of size vec2 32bit, and each sub-reg, has its own two sub-regs of 32bit each. So it looks like this, xyzw -> {xy, zw} -> {x, y, z,