similar to: [LLVMdev] LICM promoting memory to scalar

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] LICM promoting memory to scalar"

2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
I think gcc is right. It inserted a branch for n == 0 (the cbz at the top), so that's not a problem. In all other regards, this is safe: if you examine the sequence of loads and stores, it eliminated all but the first load and all but the last store. How's that unsafe? If I had to guess, the bug here is that LLVM doesn't want to hoist the load over the condition (which it is right
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
Thanks for the background on the concurrent memory model. So, is it sufficient that the loop entry is guarded by condition (cbz at top) for preventing the race? The loop entry will be guarded by condition if loop has been rotated by loop rotate pass. Since LICM runs after loop rotate, we can use ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively execute load without
2020 Jun 01
3
Aarch64: unaligned access despite -mstrict-align
Hi, I experienced a crash in code compiled with Clang 10.0.0 due to a misaligned 64-bit data access. The (ARMv8) CPU is configured with SCTL.A == 1 (alignment check enable). With SCTLR.A == 0 the code runs as expected. After some investigation I came up with the following reproducer: ---8<-------8<-------8<-------8<-------8<-------8<-------8<------- $ cat test.c extern char
2015 Feb 04
2
[LLVMdev] Question on Machine Combiner Pass
Ping From: Mandeep Singh Grang [mailto:mgrang at codeaurora.org] Sent: Tuesday, February 03, 2015 4:34 PM To: 'llvmdev at cs.uiuc.edu' Cc: 'ghoflehner at apple.com'; 'apazos at codeaurora.org'; mgrang at codeaurora.org Subject: Question on Machine Combiner Pass Hi, In the file lib/CodeGen/MachineCombiner.cpp I see that in the function
2015 Feb 19
2
[LLVMdev] ScheduleDAGInstrs computes deps using IR Values that may be invalid
Hi All, I've encountered an issue where tail merging MIs is causing a problem with the post-RA MI scheduler dependency analysis and I'm not sure of the best way to address the problem. In my case, the branch folding pass (lib/CodeGen/BranchFolding.cpp) is merging common code from BB#14 and BB#15 into BB#16. It's clear that there are 4 common instructions (marked with an *) in BB#14
2017 May 30
3
[atomics][AArch64] Possible bug in cmpxchg lowering
Currently the AtomicExpandPass will lower the following IR: define i1 @foo(i32* %obj, i32 %old, i32 %new) { entry: %v0 = cmpxchg weak volatile i32* %obj, i32 %old, i32 %new _*release acquire*_ %v1 = extractvalue { i32, i1 } %v0, 1 ret i1 %v1 } to the equivalent of the following on AArch64: _*ldxr w8, [x0]*_ cmp w8, w1 b.ne .LBB0_3 // BB#1:
2009 Mar 24
1
Discriminant analysis - stepwise procedure
Dear R users, I have some environmental variables and I need to find the best combination of them in order to separate two main groups (coded 1 and 2). I have performed a discriminant analysis using the stepclass function as a method for selecting the most relevant environmental variables. The problem is that this function includes a parameter (start.vars) and my results change a lot when I
2017 Nov 17
2
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Oliver, Thanks for trying this. Could you file a different PR for each of the problem you found and reference the umbrella PR: http://llvm.org/PR35347? <http://llvm.org/PR35347?> Thanks, -Quentin > On Nov 17, 2017, at 8:17 AM, Oliver Stannard <oliver.stannard at arm.com> wrote: > > Hi Quentin, > > One more reproducer, this time with small (<64bit) values
2017 Sep 19
0
[iovisor-dev] [PATCH RFC 3/4] New 32-bit register set
Hi, Jiong, Thanks for the patch! It is a great start to support 32bit register in BPF. In the past, I have studied a little bit to see whether 32bit register support may reduce the number of unnecessary shifts on x86_64 and improve the performance. Looking through a few bpf programs and it looks like the opportunity is not great, but still nice to have if we have this capability. As you
2014 Jun 27
3
[LLVMdev] Contributing the Apple ARM64 compiler backend
AArch64AddressTypePromotion.cpp does a fair bit of work to help make these things work out well. It could probably be generalized for non-AArch64 targets as per the comment in the file header. > On Jun 26, 2014, at 10:42 AM, Sanjay Patel <spatel at rotateright.com> wrote: > > Cool HW trick. :) > Are those 'sxtw' ops free? > That’ll depend on the details of the
2017 Nov 27
2
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Thanks all. Amara, could you take a look? > On Nov 20, 2017, at 3:06 AM, Oliver Stannard <oliver.stannard at arm.com> wrote: > > Hi Quentin, > > I’ve raised: > https://bugs.llvm.org/show_bug.cgi?id=35359 <https://bugs.llvm.org/show_bug.cgi?id=35359> > https://bugs.llvm.org/show_bug.cgi?id=35360 <https://bugs.llvm.org/show_bug.cgi?id=35360> >
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
Hi Sanjay, The behaviour I’m talking about I’ve actually pinned down to CodeGenPrepare not working too well with ISA’s that don’t have a good scaled load. I have a patch to fix it that is going through performance testing now. Your testcase seems specific to x86 – for aarch64 we get the rather spiffy: _Z3fooPii: // @_Z3fooPii // BB#0:
2016 May 27
2
Handling post-inc users in LSR
Hello, For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being
2017 Nov 14
6
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
To give an update here, we actually are not missing a mapping. The code complains because we are copying around a fp16 into a gpr32 and that shouldn’t be done with a copy (default mapping). I extended the repairing code to issue G_ANYEXT in those cases instead of asserting. However, now, I have to teach instruction select about those ANYEXT otherwise we’ll fallback in that case. But that’s a
2020 Jul 15
2
[MTE] Tagging Globals
Hello, We're evaluating memory tagging (MTE) on some internal workloads. We noticed that stack variables are tagged by an instrumentation pass and heap objects are handled by the allocator (Scudo). How about global variables? We tried a simple case using -march=armv8a+memtag -fsanitize=memtag, but found no tagging: Are we missing anything or tagging globals is still in progress? int
2016 May 27
0
Handling post-inc users in LSR
> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc
2020 Jul 15
2
[MTE] Tagging Globals
Thanks for the update, Phillips. Yes, please add me, Stephen and Ana (CCed) to Phabricator reviews. Zhaoshi From: Mitch Phillips <mitchp at google.com> Sent: Tuesday, July 14, 2020 19:10 To: Zhaoshi Zheng <zhaoshiz at quicinc.com> Cc: llvm-dev at lists.llvm.org; Stephen Long <steplong at quicinc.com> Subject: [EXT] Re: [llvm-dev] [MTE] Tagging Globals Hi Zhaoshi, Currently
2018 Dec 05
2
Strange regalloc behaviour: one more available register causes much worse allocation
Preamble -------- While working on an IR-level optimisation completely unrelated to register allocation I happened to trigger some really strange register allocator behaviour causing a large regression in bzip2 in spec2006. I've been trying to fix that regression before getting the optimisation patch committed, because I don't want to regress spec2006, but I'm basically fumbling in
2018 Dec 05
3
Strange regalloc behaviour: one more available register causes much worse allocation
enableAdvancedRASplitCost() does the same thing as ConsiderLocalIntervalCost, but as a subtarget option instead of a command-line option, and as I’ve said it doesn’t help because it’s a non-local interval causing the eviction chain (RAGreedy::splitCanCauseEvictionChain only considers the local interval for a single block, and it’s unclear to me how to make it handle a non-local interval). John
2020 Jun 01
2
Aarch64: unaligned access despite -mstrict-align
Sorry, quick message to ignore what I wrote before, I got myself confused (probably you too), With a recent trunk build I get this: f: adrp x8, g ldr x8, [x8, :lo12:g] mov w2, #16 mov x1, x0 mov x0, x8 b memcmp This looks more correct, and I need to look a bit more into this (and how clang 10.0.0 behaves).