similar to: [LLVMdev] [LSR] hoisting loop invariants in reverse order

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] [LSR] hoisting loop invariants in reverse order"

2015 May 18
2
[LLVMdev] [LSR] hoisting loop invariants in reverse order
It's not caused by "the insertion point is set to the default after". I should mention the reason somewhere earlier. "Reversing the order of arg0~3 is not intentional. The user list of pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this order, so LSR simply follows this order when collecting the LSRFixups." I'm not an expert on uselist orders,
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:
2013 Oct 28
2
[LLVMdev] loop vectorizer says Bad stride
Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext
2007 Jan 10
13
[DTrace] how to get socket read size
Hi i''m trying to write my first dtrace script apparently i bit off a bit more than i can chew, i want to track io over sockets, i found your socketsize.d that gave me how to track writes, but i''m at a loss how to track reads, frankly i don''t see how your write tracker works because it uses a probe in a function that only takes two arguments but you grab size of write
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());
2015 Jul 07
2
[LLVMdev] Modifications to SLP
Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load
2013 Oct 28
0
[LLVMdev] loop vectorizer says Bad stride
Frank, It looks like the loop vectorizer is unable to tell that the two stores in your code never overlap. This is probably because of the sign-extend in your code. Can you extend the indices to 64bit ? Thanks, Nadav On Oct 28, 2013, at 1:38 PM, Frank Winter <fwinter at jlab.org> wrote: > Verifying function > running passes ... > LV: Checking a loop in "bar" > LV:
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop
2009 Feb 18
4
tracing aio syscalls
Hi all, Is there some documentation or some example on how to interpret the arg0 .. arg<n> for the aioread, aiowrite, aiowait syscalls? The system call name for all three seems to be "kaio". Michael === Michael Mueller ================== Tel. + 49 8171 63600 Fax. + 49 8171 63615 Web: http://www.michael-mueller-it.de ======================================
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should
2013 Nov 11
2
[LLVMdev] loop vectorizer: JIT + AVX segfaults
For what it's worth, I'm also experiencing this same issue. If there is interest I can provide some very simple reproducible test cases, but I was planning on moving to MCJIT this week anyway. -- View this message in context: http://llvm.1065342.n5.nabble.com/loop-vectorizer-JIT-AVX-segfaults-tp63089p63115.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
2010 Mar 19
2
Using DTrace in 32-bit to handle 64-bit parameters [72631230]
Hi all, OK, so this at first looked like a clear cut "Don''t do it, or at worst handle the results" issue my customer has come to me with, but the more we discuss it, the more it looks like we should have better ways of dealing with this issue. > We have user defined dtrace probe points in the application which use > as parameter 64 bit values: > > provider adv {
2018 Nov 27
2
4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
So a new thinkpad: 01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] (rev a1) Hangs whenever I try to poke at the card. It starts happily enough with [ 3.971515] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20181003/nsarguments-66) [ 3.971553] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
----- Original Message ----- > > Hi Nadav, > > that's the whole point of it. I can't in general make the index > calculation simpler. The example given is the simplest non-trivial > index function that is needed. It might well be that it's that > simple that the index calculation in this case can be thrown aways > altogether and - as you say - be replaced by
2015 Aug 24
4
[RFC] design doc for straight-line scalar optimizations
Hi, As you may have noticed, since last year, we (Google's CUDA compiler team) have contributed quite a lot to the effort of optimizing LLVM for CUDA programs. I think it's worthwhile to write some docs to wrap them up for two reasons. 1) Whoever wants to understand or work on these optimizations has some detailed docs instead of just source code to refer to. 2) RFC on how to improve
2013 Nov 10
3
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
The loop vectorizer is doing an amazing job so far. Most of the time. I just came across one function which led to unexpected behavior: On this function the loop vectorizer finds a 256 bit vector as the wides vector type for the x86-64 architecture. (!) This is strange, as it was always finding the correct size of 128 bit as the widest type. I isolated the IR of the function to check if this is
2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: movaps 88(%rdx), %xmm0 where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16 byte aligned which 88 plus something aligned to 4 byte isn't. Here the
2015 Aug 25
3
[RFC] design doc for straight-line scalar optimizations
Hi Escha, We certainly would love to generalize them as long as the performance doesn't suffer in general. If you have specific use cases that are regressed due to these optimizations, I am more than happy to take a look. On Mon, Aug 24, 2015 at 6:43 PM, escha <escha at apple.com> wrote: > > On Aug 24, 2015, at 11:10 AM, Jingyue Wu via llvm-dev < > llvm-dev at
2014 Aug 07
3
[LLVMdev] How to broaden the SLP vectorizer's search
On 7 August 2014 17:33, Chad Rosier <mcrosier at codeaurora.org> wrote: > You might consider filing a bug (llvm.org/bugs) requesting a flag, but I > don't know if the code owners want to expose such a flag. I'm not sure that's a good idea as a raw access to that limit, as there are no guarantees that it'll stay the same. But maybe a flag turning some