thr3ads.net - similar to: "Aarch64: unaligned access despite -mstrict-align"

Displaying 20 results from an estimated 2000 matches similar to: "Aarch64: unaligned access despite -mstrict-align"

Aarch64: unaligned access despite -mstrict-align

2020 Jun 01

Aarch64: unaligned access despite -mstrict-align

Sorry, quick message to ignore what I wrote before, I got myself confused (probably you too), With a recent trunk build I get this: f: adrp x8, g ldr x8, [x8, :lo12:g] mov w2, #16 mov x1, x0 mov x0, x8 b memcmp This looks more correct, and I need to look a bit more into this (and how clang 10.0.0 behaves).

Hardware ASan Generating Unknown Instruction

2020 Jun 22

Hardware ASan Generating Unknown Instruction

Hi, I am trying to execute a simple hello world program compiled like so: path/to/compiled/clang -o test --target=aarch64-linux-gnu -march=armv8.5-a -fsanitize=hwaddress --sysroot=/usr/aarch64-linux-gnu/ -L/usr/lib/gcc/aarch64-linux-gnu/10.1.0/ -g test.c However, when I look at the disassembly, there is an unknown instruction listed at 0x2d51c: 000000000002d4c0 main: 2d4c0: ff c3 00 d1

[MTE] Tagging Globals

2020 Jul 15

[MTE] Tagging Globals

Hello, We're evaluating memory tagging (MTE) on some internal workloads. We noticed that stack variables are tagged by an instrumentation pass and heap objects are handled by the allocator (Scudo). How about global variables? We tried a simple case using -march=armv8a+memtag -fsanitize=memtag, but found no tagging: Are we missing anything or tagging globals is still in progress? int

[MTE] Tagging Globals

2020 Jul 15

[MTE] Tagging Globals

Thanks for the update, Phillips. Yes, please add me, Stephen and Ana (CCed) to Phabricator reviews. Zhaoshi From: Mitch Phillips <mitchp at google.com> Sent: Tuesday, July 14, 2020 19:10 To: Zhaoshi Zheng <zhaoshiz at quicinc.com> Cc: llvm-dev at lists.llvm.org; Stephen Long <steplong at quicinc.com> Subject: [EXT] Re: [llvm-dev] [MTE] Tagging Globals Hi Zhaoshi, Currently

Handling post-inc users in LSR

2016 May 27

Handling post-inc users in LSR

Hello, For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc mode. Based on the observation that the icmp is already a post-inc user, I hacked LSR to prevent the icmp from being

Handling post-inc users in LSR

2016 May 27

Handling post-inc users in LSR

> On May 27, 2016, at 2:50 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello, > > For a very simple loop where all IV users are post-inc users, I observed redundant add instructions in AArch64. > > From LSR debug, I can see initial formula for icmp is the one that transformed to a post-inc form in OptimizeLoopTermCond() and later expanded in post-inc

Issue with __attribute__((constructor)) and -Os -fno-common

2020 Jun 12

Issue with __attribute__((constructor)) and -Os -fno-common

On 6/11/20 11:25 PM, James Y Knight wrote: > The global constructor was removed by setting the initial value of "val" to > 1 instead of 0. So, the behavior of this program is preserved. Doesn't look > like erroneous behavior. OK, my example is too simplified indeed. Please consider the following instead: int val; static void __attribute__((constructor)) init_fn(void) {

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

[LLVMdev] LICM promoting memory to scalar

All, If we can speculatively execute a load instruction, why isn’t it safe to hoist it out by promoting it to a scalar in LICM pass? There is a comment in LICM pass that if a load/store is conditional then it is not safe because it would break the LLVM concurrency model (See commit 73bfa4a). It has an IR test for checking this in test/Transforms/LICM/scalar-promote-memmodel.ll However, I have

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

2019 Jun 30

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

Hi All, The following code : void hexagon2( int *a, int *res ) { int i = 100; while ( i-- ) { *res++ = *a++; } } gets compiled as a sub-optimal Software loop on LLVM 9.0 instead of a Hardware loop, whereas it was compiled as a Hardware Loop in LLVM 7.0. This is the final assembly code generated by LLVM 9.0 : .text .file "main.c" .globl hexagon2 // --

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

[LLVMdev] LICM promoting memory to scalar

I think gcc is right. It inserted a branch for n == 0 (the cbz at the top), so that's not a problem. In all other regards, this is safe: if you examine the sequence of loads and stores, it eliminated all but the first load and all but the last store. How's that unsafe? If I had to guess, the bug here is that LLVM doesn't want to hoist the load over the condition (which it is right

Issue with __attribute__((constructor)) and -Os -fno-common

2020 Jun 11

Issue with __attribute__((constructor)) and -Os -fno-common

Hi, I think that Clang erroneously discards a function annotated with __attribute__((constructor)) when flags -Os -fno-common are given. Test case below. What do you think? Thanks. ----8<--------8<--------8<--------8<--------8<--------8<-------- $ cat ctor.c int val; static void __attribute__((constructor)) init_fn(void) { val = 1; } int main(int argc, char *argv[])

Comparing Clang and GCC: only clang stores updated value in each iteration.

2018 Sep 20

Comparing Clang and GCC: only clang stores updated value in each iteration.

Hi, I have a benchmark (mcf) that is currently slower when compiled with clang compared to gcc 8 (~10%). It seems that a hot loop has a few differences, where one interesting one is that while clang stores an incremented value in each iteration, gcc waits and just stores the final value just once after the loop. The value is a global variable. I wonder if this is something clang does not do

[LLVMdev] LICM promoting memory to scalar

2014 Sep 03

[LLVMdev] LICM promoting memory to scalar

Thanks for the background on the concurrent memory model. So, is it sufficient that the loop entry is guarded by condition (cbz at top) for preventing the race? The loop entry will be guarded by condition if loop has been rotated by loop rotate pass. Since LICM runs after loop rotate, we can use ScalarEvolution::isLoopEntryGuardedByCond to check if we can speculatively execute load without

[LLVMdev] dmb ishld in AArch64

2014 Dec 09

[LLVMdev] dmb ishld in AArch64

Hi, I got an optimization problem (O1, O2) regarding memory barrier “dmb ishld” I find in the test/CodeGen/AArch64/intrinsics-memory-barrier.ll , it’s stated that memory access around DMB should not be reordered, but when compiling the Linux kernel, I found load/store in static inline void hlist_add_before_rcu(struct hlist_node *n, struct hlist_node *next) { n->pprev

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is

Hardware ASan Generating Unknown Instruction

2020 Jun 22

Hardware ASan Generating Unknown Instruction

I suspect that this is hitting the issue that I mentioned here: https://reviews.llvm.org/D65857#1621335 We may need to do what I suggested there and restrict global tag entropy on non-Android Linux to 7 bits. You can try working around this issue for now by using lld as the linker (-fuse-ld=lld). Peter On Mon, Jun 22, 2020 at 1:37 PM Mitch Phillips via llvm-dev < llvm-dev at

[LLVMdev] Pseudo load and store instructions for AArch64

2014 Aug 22

[LLVMdev] Pseudo load and store instructions for AArch64

Hi Renato, > > I'm trying to add pseudo 64-bit load and store instructions for AArch64, which > > should have latencies set to "1" while being otherwise exactly the same as > > normal load and store instructions. > > Can I ask why would you need that? This is the only way I found to stop Machine Instruction Scheduler from reordering load and store

Hardware ASan Generating Unknown Instruction

2020 Jun 22

Hardware ASan Generating Unknown Instruction

Thanks for the confirmation. From the assembly that was sent on the other branch of the thread: > .set .L.str, .L.str.hwasan-3458764513820540928 -3458764513820540928 = 0xd0 << 56 i.e. a "negative" tag. So this appears to be the issue exactly. Peter On Mon, Jun 22, 2020 at 1:55 PM Derrick McKee <derrick.mckee at gmail.com> wrote: > Using lld fixes this issue. >

Sink redundant spill after RA

2018 Feb 22

Sink redundant spill after RA

Hi All, I found some cases where a spill of a live range in a block is reloaded only in one of its successors, and there is no reload in other paths through other successors. Since the spill is reloaded only in a certain path, it must be okay to sink such spill close to its reloads. In the AArch64 code below, there is a spill(x2) in the entry, but this value is reloaded only in %bb.1, not in

similar to: Aarch64: unaligned access despite -mstrict-align