thr3ads.net - similar to: "[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs"

Displaying 20 results from an estimated 200 matches similar to: "[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs"

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Am 13.06.2017 um 02:05 schrieb Ilia Mirkin: > On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 12

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>

stochastic process transition probabilities estimation

2000 Apr 04

stochastic process transition probabilities estimation

Hi all, I'm new with R (and S), and relatively new to statistics (I'm a computer scientist), so I ask sorry in advance if my question is silly. My problem is this: I have a (sample of a) discrete time stochastic process {X_t} and I want to estimate Pr{ X_t | X_{t-l_1}, X_{t-l_2}, ..., X_{t-l_k} } where l_1, l_2, ..., l_k are some fixed time lags. It will be enough for me to compute

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

I didn't have Mips board. I compile as the commands and check the asm output as below. 1. Question: The distance of caller arg[4] and arg[5] is 4 bytes. But the the callee get every arg[] by 8 bytes offset (arg_ptr1+8 or arg_ptr2+8). I assume the #BB#4 and #BB#5 are the arg_ptr which is the pointer to access the stack arguments. 2. Question: Stack memory 28($sp) has no initial value. If

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

Does it make a difference if you give the "-target" option to clang? $ clang -target mips-linux-gnu ch8_3.cpp -o ch8_3.bc -emit-llvm -c The .s file generated this way looks quite different from the one in your email. On Tue, Feb 19, 2013 at 5:06 PM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I didn't have Mips board. I compile as the commands and check the asm >

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

[LLVMdev] Is va_arg correct on Mips backend?

Which part of the generated code do you think is not correct? Could you be more specific? I compiled this program with clang and ran it on a mips board. It returns the expected result (21). On Tue, Feb 19, 2013 at 4:15 AM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I check the Mips backend for the following C code fragment compile result. > It seems not correct. Is it my

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

[LLVMdev] Is va_arg correct on Mips backend?

I check the Mips backend for the following C code fragment compile result. It seems not correct. Is it my misunderstand or it's a bug. //ch8_3.cpp #include <stdarg.h> int sum_i(int amount, ...) { int i = 0; int val = 0; int sum = 0; va_list vl; va_start(vl, amount); for (i = 0; i < amount; i++) { val = va_arg(vl, int); sum += val; } va_end(vl);

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big

[LLVMdev] Using patterns inside patterns

2008 Oct 30

[LLVMdev] Using patterns inside patterns

I am not sure what you are looking to do. Please provide a mark up example. Evan On Oct 28, 2008, at 11:00 AM, Villmow, Micah wrote: > Is there currently a way to use a pattern inside of another pattern? > > Micah Villmow > Systems Engineer > Advanced Technology & Performance > Advanced Micro Devices Inc. > 4555 Great America Pkwy, > Santa Clara, CA. 95054 > P:

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

[LLVMdev] LICM promoting memory to scalar

All, If we can speculatively execute a load instruction, why isn’t it safe to hoist it out by promoting it to a scalar in LICM pass? There is a comment in LICM pass that if a load/store is conditional then it is not safe because it would break the LLVM concurrency model (See commit 73bfa4a). It has an IR test for checking this in test/Transforms/LICM/scalar-promote-memmodel.ll However, I have

Question about Reduce

2010 Oct 01

Question about Reduce

Hello! In the example below the Reduce() function by default assigns "a" to be the last accumulated value and "b" to be the current value in "x". I could not find this documented anywhere as the default settings for the Reduce() function. Does any sort of documentation for this behavior exist? x <- c(0,0,0,0,0,1,0,0,0,5,0,0,0,7,0,0,0,8,5,10)

[LLVMdev] Help with a Microblaze code generation problem.

2013 Oct 03

[LLVMdev] Help with a Microblaze code generation problem.

Sorry if this is a duplicate: I tried to send it last night and it didn't go through. I'm trimming some text to see if it helps. I have a simple program that fails on the Microblaze: int main() { unsigned long long x, y; x = 100; y = 0x8000000000000000ULL; return !(x > y); } As you can see, the test case compares two unsigned long long values. To try to track

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

A "hello world" coverage sanitizer

2016 Aug 13

A "hello world" coverage sanitizer

Thank you, kcc. I am unsure if I misunderstand your reply. It seems that trace-bb, rather than trace-pc, fits better for my problem, given that my instrumentation is to put before each conditional statement. Do I misunderstand something here? " Tracing basic blocks <http://clang.llvm.org/docs/SanitizerCoverage.html#id11> With -fsanitize-coverage=trace-bb the compiler will insert

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

2009 Feb 16

[LLVMdev] Modeling GPU vector registers, again (with my implementation)

Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

[LLVMdev] LICM promoting memory to scalar

I think gcc is right. It inserted a branch for n == 0 (the cbz at the top), so that's not a problem. In all other regards, this is safe: if you examine the sequence of loads and stores, it eliminated all but the first load and all but the last store. How's that unsafe? If I had to guess, the bug here is that LLVM doesn't want to hoist the load over the condition (which it is right

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

"Sanjay Patel" <spatel at rotateright.com> wrote: > IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like > this: > unsigned int foo(unsigned int crc) { > if (crc & 0x80000000) > crc <<= 1, crc ^= 0xEDB88320; > else > crc <<= 1; > return crc; > } To document this for x86 too: rewrite the function

[LLVMdev] Using patterns inside patterns

2008 Oct 28

[LLVMdev] Using patterns inside patterns

Is there currently a way to use a pattern inside of another pattern? Micah Villmow Systems Engineer Advanced Technology & Performance Advanced Micro Devices Inc. 4555 Great America Pkwy, Santa Clara, CA. 95054 P: 408-572-6219 F: 408-572-6596 -------------- next part -------------- An HTML attachment was scrubbed... URL:

ext3fs for kernel 2.4.2?

2001 Jul 26

ext3fs for kernel 2.4.2?

Hello, does anybody know if there is an ext3fs patch for kernel 2.4.2-2 that is shipped with redhat 7.1? I was abel to fins patches only for 2.4.6 and 2.4.7. Please try to help ASAP. Best regards, Imad Ossaily.

Mail_quota plugin and LDAP on Dovecot 1.2

2011 Nov 18

Mail_quota plugin and LDAP on Dovecot 1.2

Hi, I'm new in this List, but I have 6 years using Dovecot on my debian from etch,lenny and now squeeze Package: dovecot-imapd Version: 1:1.2.15-4 Tags: squeeze -- System Information: Debian Release: 6.0 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/24 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8

similar to: [LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs