thr3ads.net - search: "csel"

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 26

2

[LLVMdev] Contributing the Apple ARM64 compiler backend

...mark. Most benchmarks are less than 5% behind GCC. > > Because of the licencing of SPEC, I have to be quite restricted in what I > say and I can't give any numbers - sorry about that. > > We are focussing on Cortex-A57, and the things we've identified so far are: > * The CSEL instruction behaves worse than the equivalent branch structure > in at least one benchmark. In an out of order core, select-like > instructions > are going to be slower than their branched equivalent if the branch is > predictable due to CSEL having two dependencies. > > * Redun...

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 26

2

[LLVMdev] Contributing the Apple ARM64 compiler backend

..., and 25% ahead on one benchmark. Most benchmarks are less than 5% behind GCC. Because of the licencing of SPEC, I have to be quite restricted in what I say and I can't give any numbers - sorry about that. We are focussing on Cortex-A57, and the things we've identified so far are: * The CSEL instruction behaves worse than the equivalent branch structure in at least one benchmark. In an out of order core, select-like instructions are going to be slower than their branched equivalent if the branch is predictable due to CSEL having two dependencies. * Redundant calculations inside if c...

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 24

5

[LLVMdev] Contributing the Apple ARM64 compiler backend

Eric Christopher <echristo <at> gmail.com> writes: > > > The big pain issues I see merging from ARM64 to AArch64 are: > > 1. Apple have created a fairly complete scheduling model already for > > ARM64, and we'd have to merge the partial? model in AArch64 and theirs. We > > risk regressing performance on Apple's targets here, and we can't

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

3

[LLVMdev] LICM promoting memory to scalar

...cbz w0, .L1 adrp x6, globalvar add w5, w0, w0, lsr 31 ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar mov w2, 0 asr w5, w5, 1 .L4: cmp w5, w2 add w2, w2, w1 add w4, w3, w1 csel w3, w4, w3, hi cmp w2, w0 bcc .L4 str w3, [x6,#:lo12:globalvar] <== sink store of globalvar .L1: ret .cfi_endproc .LFE0: .size _Z3fooii, .-_Z3fooii .ident "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

2

[LLVMdev] LICM promoting memory to scalar

...t; ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar >> mov w2, 0 >> asr w5, w5, 1 >> .L4: >> cmp w5, w2 >> add w2, w2, w1 >> add w4, w3, w1 >> csel w3, w4, w3, hi >> cmp w2, w0 >> bcc .L4 >> str w3, [x6,#:lo12:globalvar] <== sink store of globalvar >> .L1: >> ret >> .cfi_endproc >> .LFE0: >> .size _Z3fooii,...

How to find out the default CPU / Features String for a given triple?

2020 Jan 23

3

How to find out the default CPU / Features String for a given triple?

...lxnum,+crc,-crypto,-custom-cheap-as-move,-cyclone,-disable-latency-sched-heuristic,+dit,+dotprod,-exynos-cheap-as-move,-exynosm1,-exynosm2,-exynosm3,-exynosm4,-falkor,+fmi,-force-32bit-jump-tables,+fp-armv8,-fp16fml,+fptoint,-fullfp16,-fuse-address,+fuse-aes,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,+jsconv,-kryo,+lor,+lse,-lsl-fast,+mpam,-mte,+neon,-no-neg-immediates,+nv,+pa,+pan,+pan-rwv,+perfmon,-predictable-select-expensive,+predres,-rand,+ras,+rasv8_4,+rcpc,+rcpc-immo,+rdm,-reserve-x1,-reserve-x10,-reserve-x11,-reserve-x12,-reserve-x13,-reserve-x14,-reserve-x15,-reserve-x18...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 03

3

[LLVMdev] LICM promoting memory to scalar

... adrp x6, globalvar add w5, w0, w0, lsr 31 ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar mov w2, 0 asr w5, w5, 1 .L4: cmp w5, w2 add w2, w2, w1 add w4, w3, w1 csel w3, w4, w3, hi cmp w2, w0 bcc .L4 str w3, [x6,#:lo12:globalvar] <== sink store of globalvar .L1: ret .cfi_endproc .LFE0: .size _Z3fooii, .-_Z3fooii .ident "GCC: (crosstool-NG linaro-1.13.1-4.8-20...

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 27

3

[LLVMdev] Contributing the Apple ARM64 compiler backend

...rk. Most benchmarks are less than 5% behind GCC. > > Because of the licencing of SPEC, I have to be quite restricted in what I > say and I can't give any numbers - sorry about that. > > We are focussing on Cortex-A57, and the things we've identified so far are: > * The CSEL instruction behaves worse than the equivalent branch structure > in at least one benchmark. In an out of order core, select-like instructions > are going to be slower than their branched equivalent if the branch is > predictable due to CSEL having two dependencies. > > * Redundant...

search for: csel