thr3ads.net - similar to: "[LLVMdev] State of Loop Unrolling and Vectorization in LLVM"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] State of Loop Unrolling and Vectorization in LLVM"

[LLVMdev] Improving the usability of LNT

2013 Apr 30

[LLVMdev] Improving the usability of LNT

Hi Daniel, I made some changes to the LNT perf reporting tool to make it more user friendly by adding some features: 1. Make the sidebar and the navigation bar stationary, so that it is easy to navigate the site 2. Have the pop-down menu for the items in the navigation bar, activate upon hovering the mouse, rather than clicking the item 3. Add a nav-link in the sidebar for the

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi, I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform

[LLVMdev] Improving the usability of LNT

2013 May 02

[LLVMdev] Improving the usability of LNT

Wow, that sounds great! Thanks for working on this, and yes, please, send the patches! --renato On 30 April 2013 16:23, Murali, Sriram <sriram.murali at intel.com> wrote: > Hi Daniel,**** > > I made some changes to the LNT perf reporting tool to make it more user > friendly by adding some features:**** > > **1. **Make the sidebar and the navigation bar stationary,

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Nadav, Thanks for the response. I forgot to mention that there is an upper limit of 16 for the Trip Count check, TinyTripCountVectorThreshold = 16; if (TC > 0u && TC < TinyTripCountVectorThreshold). So right now, any loop with Trip Count as 0, or with value >=16, LV with unroll. With the change to the lower bound, it will also include the loop with 0 trip count. SCEV returns 0

[LLVMdev] Question about CriticalAntiDepBreaker.cpp

2012 Apr 09

[LLVMdev] Question about CriticalAntiDepBreaker.cpp

In the course of implementing the instruction scheduler for the Intel Atom in LLVM, I have run across a problem with the critical anti-dependence breaker, whereby CriticalAntiDepBreak.cpp code changes some XMM0 references to be XMM9 references. This would be all well and good, were it not for the fact that the result of the expression needs to be in XMM0 because it is being returned as the

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Sriram, The problem is that you want to unroll/vectorize many loops with non-constant loop count - it is a trade-off of which case you estimate as more likely. int foo(int *ptr, int n) { for ( .. i <n) ptr[i] = ... } The question is: is it more likely to have “n” such that unrolling is beneficial or not. Now, you could probably write an analysis that bounds the loop count (for the

[LLVMdev] Calling with register indirect reference instead of memory indirect reference.

2013 Feb 28

[LLVMdev] Calling with register indirect reference instead of memory indirect reference.

Hi, I am working on a small optimization feature to replace the calls with indirect reference using a memory with an indirect reference using register. The purpose of this feature is to improve the performance of calls to functions referred to by function pointers. The motivation behind this work is that gcc does this optimization. Here is a small test case, that will generate an indirect call

[LLVMdev] How to locate the start if an address mode in an X86 MachineInstr?

2012 Sep 20

[LLVMdev] How to locate the start if an address mode in an X86 MachineInstr?

My team interested in doing some post-RA optimizations on X86 instructions, which would require identifying memory reference instructions. In the X86 back end instructions, memory addresses consist of a set of five operands. The offset to the start of the five operands depends on the format of the instruction. For instance, the instructions ADC32rm, ADD32rm, AND32rm, ANDN32rm, CMOVA32rm,

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

2012 Nov 26

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

I am investigating changing BoundsChecking to use address-based rather than size- & offset-based tests. To explain, here is a short code sample cribbed from one of the tests: %mem = tail call i8* @calloc(i64 1, i64 %elements) %memobj = bitcast i8* %mem to i64* %ptr = getelementptr inbounds i64* %memobj, i64 %index %4 = load i64* %ptr, align 8 Currently, the IR for bounds checking

[LLVMdev] Disabling x87 instructions for a sub-target

2012 Apr 04

[LLVMdev] Disabling x87 instructions for a sub-target

Hi Sriram, I'm not sure if I understand your question correctly: Do you need to generate code that contains no x87 floating-point instructions altogether, but uses calls into a soft-float library instead? That behaviour can be enabled using the "-soft-float" flag, as far as I know. Or is it only about the fcomi* instructions, which are not supported by pre-Pentium Pro chips? Then I

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

2012 Dec 04

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

Nuno, Inspired by this email thread, I spent a bit of time today looking through the implementation of BoundsChecking::instrument(..). Based on my reading of prior work, it should be possible to do these checks in two comparisons, or possibly even one if the right assumptions could be made. Could you provide a bit of background of the expected domains of Size and Offset? In particular,

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

2012 Dec 04

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

Hi, > Could you provide a bit of background of the expected domains of Size and > Offset? In particular, are they signed or unsigned integers? A > non-negative size doesn't seem to make much sense in this context, but > depending on how it's calculated I could see it arising. Is a zero Size > something that might arise here? I'm assuming the Offset comes from an

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

2012 Nov 26

[LLVMdev] RFC: change BoundsChecking.cpp to use address-based tests

Hi Kevin, Thanks for your interest and for your deep analysis. Unfortunately, your approach doesn't catch all bugs and is vulnerable to an attack. Consider the following case: ...................... | ----- obj --- | | end ^ ptr ^ ^ end-of-memory The scenario is as follows: - an object is allocated in the last page of the address space - obj is byte

[LLVMdev] Disabling x87 instructions for a sub-target

2012 Apr 04

[LLVMdev] Disabling x87 instructions for a sub-target

Hello there, I recently started working on the LLVM backend for a target that doesn't support x87 instructions. Currently, I am in the process of completely disabling some x87 instructions such as fcomi, fcompi,... for a specific sub-target. I also do not have SSE enabled for my sub-target, and llvm resorts to fcomi* instructions for FP compare instructions. Is there a way to bypass the

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried

[LLVMdev] Public SmallVectorImpl constructor?

2012 Jan 20

[LLVMdev] Public SmallVectorImpl constructor?

I've had the same thought but never got around to trying to implement it. Does everything compile for you if it's protected? If so, then a patch would probably be happily accepted ------------------------------ From: Vane, Edwin Sent: 1/20/2012 7:13 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Public SmallVectorImpl constructor? Hi all, Just finished debugging a memory

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

2013 Jan 29

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

On Tue, Jan 29, 2013 at 3:59 PM, Murali, Sriram <sriram.murali at intel.com> wrote: > Our benchmark results show that the compilation time performance improved by > ~0.5%. That's fairly small; what was the standard deviation, confidence interval, etc? -- Sean Silva

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hey Arnold, I have run into this situation many times while benchmarking. I think it is best if this is addressed using a simple heuristic. For that, we need to identify the loop cost and decide if it makes sense to completely unroll the loop, or partially unroll. I am unsure of the optimal way to implement this though. I want to run it by the list to get any ideas floating around :) Thanks

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating

similar to: [LLVMdev] State of Loop Unrolling and Vectorization in LLVM