Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] [RFC] Heuristic for complete loop unrolling"
2012 Nov 23
1
[LLVMdev] Disable loop unroll pass
Hi, Ivan:
Sorry for deviating the topic a bit. As I told you before I'm a LLVM
newbie, I cannot
give you conclusive answer if the proposed interface is ok or not.
My personal opinion on these two interface is summarized bellow:
- hasZeroCostLoop()
pro: it is clearly state the HW support.
con: Having zero cost loop doesn't imply the benefit HW loop could
achieve.
2015 Jul 28
2
[LLVMdev] RFC: LoopEditor, a high-level loop transform toolkit
Hi Michael,
+llvmdev,Hal,Nadav
For testing, I was currently thinking of a two pronged approach. Lit tests
as you suggest with a dummy pass, probably with command line options to
define what transform to do, and unit tests to test the delegate behaviour
and return values.
I'll try and produce a mega patch with at least the loop vectoriser moved
over, then split it up again after review.
2012 Nov 22
2
[LLVMdev] Disable loop unroll pass
Hi, Gang:
I don't want to discuss Open64 internal in LLVM mailing list. Let us
only focus on the design per se.
As your this mail and your previous mail combined give me a impression
that :
The only reason you introduce the specific operator for HW loop in
Scalar Opt simply because
you have hard time in figure out the trip count in CodeGen.
This might be true for Open64's
2012 Nov 23
0
[LLVMdev] Disable loop unroll pass
Hi Shuxin,
On 23/11/2012 00:17, Shuxin Yang wrote:
> Hi, Gang:
>
> I don't want to discuss Open64 internal in LLVM mailing list. Let us
> only focus on the design per se.
> As your this mail and your previous mail combined give me a impression
> that :
>
> The only reason you introduce the specific operator for HW loop in
> Scalar Opt simply because
>
2015 May 02
5
[LLVMdev] Modifying LoopUnrollingPass
Hi Zhoulai,
I am trying to modify "LoopUnrollPass" in llvm which produces multiple
copies of loop equal to the loop unroll factor.Currently, using multicore
architecture, say 3 for example and the execution goes like:
for 3 cores if there are 9 iterations of loop
core instruction
1 0,3,6
2 1,4,7
3 2,5,8
But I want to to
2017 Jan 18
2
llvm is getting slower, January edition
On 1/18/17 3:55 PM, Davide Italiano via llvm-dev wrote:
> On Tue, Jan 17, 2017 at 6:02 PM, Mikhail Zolotukhin
> <mzolotukhin at apple.com> wrote:
>> Hi,
>>
>> Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100%
2012 Apr 03
1
[LLVMdev] Possible typo in LoopUnrollPass.cpp
hi,
In "LoopUnrollPass.cpp", when trying to reduce unroll count to meet
the unroll threshold requirement in line 200 and line 206, variable
"CurrentThreshold" is used in the computation, instead of the variable
"Threshold", which is defined by:
// Determine the current unrolling threshold. While this is normally set
// from UnrollThreshold, it is overridden to
2015 Feb 09
3
[LLVMdev] aarch64 status for generating SIMD instructions
% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c
dot.c:15:1: remark: loop not vectorized: value that could not be identified as
reduction is used outside the loop [-Rpass-analysis=loop-vectorize]
}
^
dot.c:15:1: note: could not determine the original source location for :0:0
I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a
2017 Jan 18
10
llvm is getting slower, January edition
Hi,
Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what's going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.
2013 Sep 27
2
[LLVMdev] Trip count and Loop Vectorizer
Hi,
I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop:
void foo(int a[4][8], int n)
{
int b[4][8];
for(int i = 0; i < 4; i++) {
for(int j = 0; j < n; j++) {
a[i][j] = b[i][j];
}
}
}
* Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform
2015 Jul 08
7
[LLVMdev] LLVM loop vectorizer
Hello.
I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure
but the LLVM loop vectorizer is not able to handle such code.
I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option
with clang and -loop-vectorize with opt-3.4 .
The CSR SpMV function is inspired from
2016 Aug 10
3
SCEV LoopTripCount
Hello,
I was doing some experiments with SCEV and especially the loop trip count.
Sorry for the dumb question, what is the easiest way to dump SCEV analysis
results on a .bc file?
On a side note, I wanted to see if we could optimize this function:
unsigned long kernel(long w, long h, long d) {
unsigned long count = 0;
for(int i = 0; i < w; ++i)
for(int j = i; j < h; ++j)
for(int k = j; k
2008 May 07
8
[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy
Hello Matthijs,
Separating mechanism from policy is a good thing for the LoopUnroll
pass. Instead of moving the policy to a subclass though, I think it'd
be better to move the mechanism, the unrollLoop function, out to be a
standalone utility function, with the LoopInfo object passed in
explicitly. FoldBlockIntoPredecessor would also be good to make into
a standalone utility function, since
2015 Aug 23
4
Scaling to many basic blocks
On Sat, Aug 22, 2015 at 11:57 PM, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:
> Hi,
>
> Several passes would have troubles with such code (namely, GVN and
> JumpThreading).
Can you just choose not to run those particular passes? I suppose the big
problem would be if there's a problem with the code generation and related
stuff like instruction scheduling and
2017 Mar 01
2
Noisy benchmark results?
On 28 Feb 2017, at 22:50, Michael Zolotukhin <mzolotukhin at apple.com<mailto:mzolotukhin at apple.com>> wrote:
I also usually rerun suspiciously improved or regressed tests to verify the performance change. Most of the time, if it was just a noise, the test doesn’t appear on another run. I wish LNT (or any other script) could do that for me :)
Michael
Doesn't the lnt runtest nt
2015 Aug 22
4
Scaling to many basic blocks
How well does LLVM scale to many basic blocks? Let's say you have a single
function consisting of a million basic blocks each with a few tens of
instructions (and assuming the whole thing isn't trivially repetitive so
the number of simultaneously live variables and whatever is large) and you
feed that through the optimisers into the backend code generator, will this
work okay, or will it
2008 May 09
0
[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy
Hi All,
the attached patch performs the splitting in the proposed manner.
before applying the patch, please execute
svn cp lib/Transforms/Scalar/LoopUnroll.cpp lib/Transforms/Utils/UnrollLoop.cpp
to make the patch apply and preserve proper history.
Transforms/Utils/UnrollLoop.cpp contains the unrollLoop function, which is now
used by the LoopUnroll pass. I've also moved the
2017 Dec 12
3
[cfe-dev] Who wants faster LLVM/Clang builds?
On Mon, Dec 11, 2017 at 3:37 PM, Mikhail Zolotukhin via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi Kim,
>
> On Dec 10, 2017, at 7:39 AM, Kim Gräsman <kim.grasman at gmail.com> wrote:
>
> Hi Michael,
>
> On Thu, Dec 7, 2017 at 3:16 AM, Michael Zolotukhin
> <mzolotukhin at apple.com> wrote:
>
>
> Nice to IWYU developers here:) I wonder how
2017 Dec 10
3
[cfe-dev] Who wants faster LLVM/Clang builds?
Hi Michael,
On Thu, Dec 7, 2017 at 3:16 AM, Michael Zolotukhin
<mzolotukhin at apple.com> wrote:
>
> Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on
> LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong).
There are known problems with running IWYU over LLVM/Clang -- Zachary
Turner made an attempt a while back to get it up and running.
2013 Sep 27
0
[LLVMdev] Trip count and Loop Vectorizer
Hi Sriram,
Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You