similar to: [LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

Displaying 20 results from an estimated 200 matches similar to: "[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs"

2017 Jun 13
1
[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
Am 13.06.2017 um 02:05 schrieb Ilia Mirkin: > On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing
2017 Jun 12
3
[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>
2000 Apr 04
0
stochastic process transition probabilities estimation
Hi all, I'm new with R (and S), and relatively new to statistics (I'm a computer scientist), so I ask sorry in advance if my question is silly. My problem is this: I have a (sample of a) discrete time stochastic process {X_t} and I want to estimate Pr{ X_t | X_{t-l_1}, X_{t-l_2}, ..., X_{t-l_k} } where l_1, l_2, ..., l_k are some fixed time lags. It will be enough for me to compute
2013 Feb 20
3
[LLVMdev] Is va_arg correct on Mips backend?
I didn't have Mips board. I compile as the commands and check the asm output as below. 1. Question: The distance of caller arg[4] and arg[5] is 4 bytes. But the the callee get every arg[] by 8 bytes offset (arg_ptr1+8 or arg_ptr2+8). I assume the #BB#4 and #BB#5 are the arg_ptr which is the pointer to access the stack arguments. 2. Question: Stack memory 28($sp) has no initial value. If
2013 Feb 20
0
[LLVMdev] Is va_arg correct on Mips backend?
Does it make a difference if you give the "-target" option to clang? $ clang -target mips-linux-gnu ch8_3.cpp -o ch8_3.bc -emit-llvm -c The .s file generated this way looks quite different from the one in your email. On Tue, Feb 19, 2013 at 5:06 PM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I didn't have Mips board. I compile as the commands and check the asm >
2013 Feb 19
0
[LLVMdev] Is va_arg correct on Mips backend?
Which part of the generated code do you think is not correct? Could you be more specific? I compiled this program with clang and ran it on a mips board. It returns the expected result (21). On Tue, Feb 19, 2013 at 4:15 AM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I check the Mips backend for the following C code fragment compile result. > It seems not correct. Is it my
2013 Feb 19
2
[LLVMdev] Is va_arg correct on Mips backend?
I check the Mips backend for the following C code fragment compile result. It seems not correct. Is it my misunderstand or it's a bug. //ch8_3.cpp #include <stdarg.h> int sum_i(int amount, ...) { int i = 0; int val = 0; int sum = 0; va_list vl; va_start(vl, amount); for (i = 0; i < amount; i++) { val = va_arg(vl, int); sum += val; } va_end(vl);
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big
2008 Oct 30
0
[LLVMdev] Using patterns inside patterns
I am not sure what you are looking to do. Please provide a mark up example. Evan On Oct 28, 2008, at 11:00 AM, Villmow, Micah wrote: > Is there currently a way to use a pattern inside of another pattern? > > Micah Villmow > Systems Engineer > Advanced Technology & Performance > Advanced Micro Devices Inc. > 4555 Great America Pkwy, > Santa Clara, CA. 95054 > P:
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
All, If we can speculatively execute a load instruction, why isn’t it safe to hoist it out by promoting it to a scalar in LICM pass? There is a comment in LICM pass that if a load/store is conditional then it is not safe because it would break the LLVM concurrency model (See commit 73bfa4a). It has an IR test for checking this in test/Transforms/LICM/scalar-promote-memmodel.ll However, I have
2010 Oct 01
1
Question about Reduce
Hello! In the example below the Reduce() function by default assigns "a" to be the last accumulated value and "b" to be the current value in "x". I could not find this documented anywhere as the default settings for the Reduce() function. Does any sort of documentation for this behavior exist? x <- c(0,0,0,0,0,1,0,0,0,5,0,0,0,7,0,0,0,8,5,10)
2013 Oct 03
1
[LLVMdev] Help with a Microblaze code generation problem.
Sorry if this is a duplicate: I tried to send it last night and it didn't go through. I'm trimming some text to see if it helps. I have a simple program that fails on the Microblaze: int main() { unsigned long long x, y; x = 100; y = 0x8000000000000000ULL; return !(x > y); } As you can see, the test case compares two unsigned long long values. To try to track
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2016 Aug 13
2
A "hello world" coverage sanitizer
Thank you, kcc. I am unsure if I misunderstand your reply. It seems that trace-bb, rather than trace-pc, fits better for my problem, given that my instrumentation is to put before each conditional statement. Do I misunderstand something here? " Tracing basic blocks <http://clang.llvm.org/docs/SanitizerCoverage.html#id11> With -fsanitize-coverage=trace-bb the compiler will insert
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
I think gcc is right. It inserted a branch for n == 0 (the cbz at the top), so that's not a problem. In all other regards, this is safe: if you examine the sequence of loads and stores, it eliminated all but the first load and all but the last store. How's that unsafe? If I had to guess, the bug here is that LLVM doesn't want to hoist the load over the condition (which it is right
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
"Sanjay Patel" <spatel at rotateright.com> wrote: > IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like > this: > unsigned int foo(unsigned int crc) { > if (crc & 0x80000000) > crc <<= 1, crc ^= 0xEDB88320; > else > crc <<= 1; > return crc; > } To document this for x86 too: rewrite the function
2008 Oct 28
4
[LLVMdev] Using patterns inside patterns
Is there currently a way to use a pattern inside of another pattern? Micah Villmow Systems Engineer Advanced Technology & Performance Advanced Micro Devices Inc. 4555 Great America Pkwy, Santa Clara, CA. 95054 P: 408-572-6219 F: 408-572-6596 -------------- next part -------------- An HTML attachment was scrubbed... URL:
2001 Jul 26
1
ext3fs for kernel 2.4.2?
Hello, does anybody know if there is an ext3fs patch for kernel 2.4.2-2 that is shipped with redhat 7.1? I was abel to fins patches only for 2.4.6 and 2.4.7. Please try to help ASAP. Best regards, Imad Ossaily.
2011 Nov 18
2
Mail_quota plugin and LDAP on Dovecot 1.2
Hi, I'm new in this List, but I have 6 years using Dovecot on my debian from etch,lenny and now squeeze Package: dovecot-imapd Version: 1:1.2.15-4 Tags: squeeze -- System Information: Debian Release: 6.0 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/24 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8