similar to: [LLVMdev] Should we enable Partial unrolling and Runtime unrolling on AArch64?

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Should we enable Partial unrolling and Runtime unrolling on AArch64?"

2015 Mar 11
2
[LLVMdev] How to run two loop passes non-interleaved if they are registered one by one?
Hi Hal, James told me that in PassManagerBuilder.cpp, BarrierNoopPass is already used for this kind of purpose(though there's also a fixme saying it's hacking). I think it's a good idea to use this pass here. Thanks, Kevin 2015-03-11 17:05 GMT+08:00 Hal Finkel <hfinkel at anl.gov>: > ----- Original Message ----- > > From: "Kevin Qin" <kevinqindev at
2014 Jun 25
4
[LLVMdev] [cfe-dev] AArch64 Clang CLI interface proposal
Hi Tim, 2014-06-25 15:26 GMT+08:00 Tim Northover <t.p.northover at gmail.com>: > Hi Kevin, > > I assume you've looked at the GCC documentation in this area, since > your ideas are very similar: > https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html. I actually > think that looks like a rational set of conventions too. > > The main difference appears to be
2015 Mar 11
2
[LLVMdev] How to run two loop passes non-interleaved if they are registered one by one?
Hi LLVM developers, I want to add LICM pass after loop unrolling pass in current optimization pipeline. Because both of them are loop passes, so if I registered them one by one, they will interleaved go through all loops in bottom up way within same loop pass manager. Loop unroling pass may create new inner loops from partial unrolling, and those newly created loops can be visited only if the
2015 Apr 01
3
[LLVMdev] why we assume malloc() always returns a non-null pointer in instruction combing?
Hi David and Mats, Thanks for your explanation. If my understanding is correct, it means we don't need to consider the side-effect of malloc/free unless compiling with -ffreestanding. Because without -ffreestanding, user defined malloc/free should be compatible with std library. It makes sense to me. My point is, in std library, malloc is allowed to return null if this malloc failed. Why
2015 Oct 16
2
question about llvm partial unrolling/runtime unrolling
Hi Hal, I did opt.exe -S -debug -loop-unroll -unroll-runtime=true -unroll-count=4 csShader.ll and it prints out: Args: opt.exe -S -debug -loop-unroll -unroll-runtime=true -unroll-count=4 csShader.ll Loop Unroll: F[build_cs_5_0] Loop %loop_entry Loop Size = 82 partially unrolling with count: 1 Thanks, Frances On Thu, Oct 15, 2015 at 9:35 PM, Hal Finkel <hfinkel at anl.gov>
2015 Oct 12
2
question about llvm partial unrolling/runtime unrolling
Hi, I am trying to do loop unrolling with loops that don't have constant loop counter. It is highly appreciated if anyone can help me on this. What I want to do is to turn loop (n) { <loop body> } into loop (n/4) { <loop body> <loop body> <loop body> <loop body> } loop (n%4) { <loop
2015 Apr 01
2
[LLVMdev] why we assume malloc() always returns a non-null pointer in instruction combing?
Hi Mats, I think Kevin's point is malloc can return 0, if malloc/free pair is optimized way, the semantic of the original would be changed. On the other hand, malloc/free are special functions, but programmers can still define their own versions by not linking std library, so we must assume malloc/free always have side-effect like other common functions, unless we know we will link std
2015 Mar 31
2
[LLVMdev] why we assume malloc() always returns a non-null pointer in instruction combing?
> I think we can do such optimization with operator new, because new never returns null. This is incorrect in the case of `new (std::nothrow) ...` - the whole point of `(std::nothrow)` is to tell new that it should return NULL in case of failure, rather than throw an exception (bad_alloc). But the point here is not the actual return value, but the fact that the compiler misses that the
2015 Mar 31
2
[LLVMdev] why we assume malloc() always returns a non-null pointer in instruction combing?
Hi, When looking into the bug in https://llvm.org/bugs/show_bug.cgi?id=21421, I found a regression test in Transforms/InstCombine/malloc-free-delete.ll against me to directly fix it. The test is, define i1 @foo() { ; CHECK-LABEL: @foo( ; CHECK-NEXT: ret i1 false %m = call i8* @malloc(i32 1) %z = icmp eq i8* %m, null call void @free(i8* %m) ret i1 %z } According to
2014 Jan 27
2
[LLVMdev] [cfe-dev] AArch64 Clang CLI interface proposal
Ping. Can I assume that we're ok with this interface proposal then? Amara -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140127/49962cbc/attachment.html>
2014 Jun 25
3
[LLVMdev] [cfe-dev] AArch64 Clang CLI interface proposal
Hi, Recently, I committed a patch adding default features for '-mcpu'. And after that, Eric replied me here's a proposal toward using '-march' instead of '-mcpu'. As it's half a year later from original proposal, some background may changes. One thing worth to mention is, during this time, Apple Contributed its backend and introduced another new CPU type: cyclone.
2014 Jul 15
4
[LLVMdev] Partial loop unrolling
Hi, PS: It is a generic question related to partial loop unrolling, and nothing specific to LLVM. As far as partial loop unrolling is concerned, I could see following three different possibilities. Assume that unroll factor is 3. Original loop: for (i = 0; i < 10; i++) { do_foo(i); } 1. First possibility i = 0; do_foo(i++); do_foo(i++);
2014 Jul 31
2
[LLVMdev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
Hi all, Recently, I found gcc can generate faster codes by localizing global variable inside loop and only write back once before function return. Gcc can do this because its alias analysis think it's safe. But for below case, gcc gives different result from -O0 and -O2. #include <stdio.h> struct heap {int index; int b;}; struct heap **ptr; int aa; int main() { struct heap element;
2015 Feb 04
2
[LLVMdev] Is this a bug with loop unrolling and TargetTransformInfo ?
Hi, I ran into this issue recently and wanted to know if it was a bug or expected behavior. In the R600 backend's TargetTransformInfo implementation, we were setting UnrollingPreferences::Count = UINT_MAX. This was a mistake as we should have been setting UnrollingPreferences::MaxCount instead. However, as a result of setting Count to UINT_MAX, this loop would be unrolled 15 times: if (b
2014 Jul 10
2
[LLVMdev] Help!!!!Help!!!! " LLVM ERROR: Cannot select: 0x9fc9680: i32 = fp32_to_fp16 0x9fc0750 [ID=16] " problem!!!!!!!!!!!!!!!!!!
Hi Daniel,    Thank you your replying.     Yes, the problem is about MIPS backend. You give me this message "There is limited support for the <8 x f16> type when MSA (MIPS SIMD Architecture) is enabled but even then scalar half-precision is not currently supported."  Could you give me some official link or some evidence? Thank you very much. Robin yalong at multicorewareinc.com
2016 May 05
4
LLVM issuse:AArch64 TargetParser
Hi everyone, I'm a member engineer of linaro's llvm team,coming from Spreadtrum.I am a new person on LLVM.Now I'm writing a Target Parser for AArch64,so options parsing of AArch64 about cpu & arch & fpu can be summary to one place. In the TargetParser,we assume "aarch64" and "arm64" are synonyms of armv8a(as they are only for armv8a,people usually do
2014 Mar 28
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
Hi Chandler, > I'd like to hear from the existing AArch64 backend maintainers if they have > any concerns about getting #1 done right away? I agree (with my AArch64 maintainer hat on). The goal of merging these two backends will hopefully do a lot to make sure the eventual code is better than what exists now in either individual directory; and doing the real work towards that will be
2016 Apr 21
2
RFC: EfficiencySanitizer
> > > Will this technology allow us to pinpoint specific accesses that generally > have high latency (i.e. generally are cache misses)? This information is > useful for programmers, and is also useful as an input to loop unrolling, > instruction scheduling, and the like on ooo cores. > Won't hardware performance counter tell you which accesses are delinquent accesses? The
2016 Apr 21
2
RFC: EfficiencySanitizer
On Thu, Apr 21, 2016 at 1:57 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > > ------------------------------ > > *From: *"Qin Zhao" <zhaoqin at google.com> > *To: *"Hal Finkel" <hfinkel at anl.gov> > *Cc: *"Derek Bruening" <bruening at google.com>, > efficiency-sanitizer at google.com, "llvm-dev" <llvm-dev
2012 May 24
0
[LLVMdev] data dependency and fully loop unrolling
Cheng, Are you looking specifically for an analysis that can 'undo' the effects of loop unrolling, or do you want dependency analysis that can run on the loop prior to unrolling? For dependency analysis on loops (prior to unrolling) Preston and Sanjoy have been working on this, see: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-May/049769.html