thr3ads.net - similar to: "[LLVMdev] [LSR] hoisting loop invariants in reverse order"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] [LSR] hoisting loop invariants in reverse order"

[LLVMdev] [LSR] hoisting loop invariants in reverse order

2015 May 18

[LLVMdev] [LSR] hoisting loop invariants in reverse order

It's not caused by "the insertion point is set to the default after". I should mention the reason somewhere earlier. "Reversing the order of arg0~3 is not intentional. The user list of pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this order, so LSR simply follows this order when collecting the LSRFixups." I'm not an expert on uselist orders,

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext

[DTrace] how to get socket read size

2007 Jan 10

[DTrace] how to get socket read size

Hi i''m trying to write my first dtrace script apparently i bit off a bit more than i can chew, i want to track io over sockets, i found your socketsize.d that gave me how to track writes, but i''m at a loss how to track reads, frankly i don''t see how your write tracker works because it uses a probe in a function that only takes two arguments but you grab size of write

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

Frank, It looks like the loop vectorizer is unable to tell that the two stores in your code never overlap. This is probably because of the sign-extend in your code. Can you extend the indices to 64bit ? Thanks, Nadav On Oct 28, 2013, at 1:38 PM, Frank Winter <fwinter at jlab.org> wrote: > Verifying function > running passes ... > LV: Checking a loop in "bar" > LV:

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop

tracing aio syscalls

2009 Feb 18

tracing aio syscalls

Hi all, Is there some documentation or some example on how to interpret the arg0 .. arg<n> for the aioread, aiowrite, aiowait syscalls? The system call name for all three seems to be "kaio". Michael === Michael Mueller ================== Tel. + 49 8171 63600 Fax. + 49 8171 63615 Web: http://www.michael-mueller-it.de ======================================

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should

[LLVMdev] loop vectorizer: JIT + AVX segfaults

2013 Nov 11

[LLVMdev] loop vectorizer: JIT + AVX segfaults

For what it's worth, I'm also experiencing this same issue. If there is interest I can provide some very simple reproducible test cases, but I was planning on moving to MCJIT this week anyway. -- View this message in context: http://llvm.1065342.n5.nabble.com/loop-vectorizer-JIT-AVX-segfaults-tp63089p63115.html Sent from the LLVM - Dev mailing list archive at Nabble.com.

Using DTrace in 32-bit to handle 64-bit parameters [72631230]

2010 Mar 19

Using DTrace in 32-bit to handle 64-bit parameters [72631230]

Hi all, OK, so this at first looked like a clear cut "Don''t do it, or at worst handle the results" issue my customer has come to me with, but the more we discuss it, the more it looks like we should have better ways of dealing with this issue. > We have user defined dtrace probe points in the application which use > as parameter 64 bit values: > > provider adv {

4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

2018 Nov 27

4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

So a new thinkpad: 01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] (rev a1) Hangs whenever I try to poke at the card. It starts happily enough with [ 3.971515] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20181003/nsarguments-66) [ 3.971553] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4

[LLVMdev] loop vectorizer misses opportunity, exploit

2013 Oct 31

[LLVMdev] loop vectorizer misses opportunity, exploit

----- Original Message ----- > > Hi Nadav, > > that's the whole point of it. I can't in general make the index > calculation simpler. The example given is the simplest non-trivial > index function that is needed. It might well be that it's that > simple that the index calculation in this case can be thrown aways > altogether and - as you say - be replaced by

[RFC] design doc for straight-line scalar optimizations

2015 Aug 24

[RFC] design doc for straight-line scalar optimizations

Hi, As you may have noticed, since last year, we (Google's CUDA compiler team) have contributed quite a lot to the effort of optimizing LLVM for CUDA programs. I think it's worthwhile to write some docs to wrap them up for two reasons. 1) Whoever wants to understand or work on these optimizations has some detailed docs instead of just source code to refer to. 2) RFC on how to improve

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

The loop vectorizer is doing an amazing job so far. Most of the time. I just came across one function which led to unexpected behavior: On this function the loop vectorizer finds a 256 bit vector as the wides vector type for the x86-64 architecture. (!) This is strange, as it was always finding the correct size of 128 bit as the widest type. I isolated the IR of the function to check if this is

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

[LLVMdev] MCJIT generates MOVAPS on unaligned address

MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: movaps 88(%rdx), %xmm0 where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16 byte aligned which 88 plus something aligned to 4 byte isn't. Here the

[RFC] design doc for straight-line scalar optimizations

2015 Aug 25

[RFC] design doc for straight-line scalar optimizations

Hi Escha, We certainly would love to generalize them as long as the performance doesn't suffer in general. If you have specific use cases that are regressed due to these optimizations, I am more than happy to take a look. On Mon, Aug 24, 2015 at 6:43 PM, escha <escha at apple.com> wrote: > > On Aug 24, 2015, at 11:10 AM, Jingyue Wu via llvm-dev < > llvm-dev at

[LLVMdev] How to broaden the SLP vectorizer's search

2014 Aug 07

[LLVMdev] How to broaden the SLP vectorizer's search

On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some

similar to: [LLVMdev] [LSR] hoisting loop invariants in reverse order