search for: csel

Displaying 8 results from an estimated 8 matches for "csel".

Did you mean: cseg
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
...mark. Most benchmarks are less than 5% behind GCC. > > Because of the licencing of SPEC, I have to be quite restricted in what I > say and I can't give any numbers - sorry about that. > > We are focussing on Cortex-A57, and the things we've identified so far are: > * The CSEL instruction behaves worse than the equivalent branch structure > in at least one benchmark. In an out of order core, select-like > instructions > are going to be slower than their branched equivalent if the branch is > predictable due to CSEL having two dependencies. > > * Redun...
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
..., and 25% ahead on one benchmark. Most benchmarks are less than 5% behind GCC. Because of the licencing of SPEC, I have to be quite restricted in what I say and I can't give any numbers - sorry about that. We are focussing on Cortex-A57, and the things we've identified so far are: * The CSEL instruction behaves worse than the equivalent branch structure in at least one benchmark. In an out of order core, select-like instructions are going to be slower than their branched equivalent if the branch is predictable due to CSEL having two dependencies. * Redundant calculations inside if c...
2014 Jun 24
5
[LLVMdev] Contributing the Apple ARM64 compiler backend
Eric Christopher <echristo <at> gmail.com> writes: > > > The big pain issues I see merging from ARM64 to AArch64 are: > > 1. Apple have created a fairly complete scheduling model already for > > ARM64, and we'd have to merge the partial? model in AArch64 and theirs. We > > risk regressing performance on Apple's targets here, and we can't
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
...cbz w0, .L1 adrp x6, globalvar add w5, w0, w0, lsr 31 ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar mov w2, 0 asr w5, w5, 1 .L4: cmp w5, w2 add w2, w2, w1 add w4, w3, w1 csel w3, w4, w3, hi cmp w2, w0 bcc .L4 str w3, [x6,#:lo12:globalvar] <== sink store of globalvar .L1: ret .cfi_endproc .LFE0: .size _Z3fooii, .-_Z3fooii .ident "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2...
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
...t; ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar >> mov w2, 0 >> asr w5, w5, 1 >> .L4: >> cmp w5, w2 >> add w2, w2, w1 >> add w4, w3, w1 >> csel w3, w4, w3, hi >> cmp w2, w0 >> bcc .L4 >> str w3, [x6,#:lo12:globalvar] <== sink store of globalvar >> .L1: >> ret >> .cfi_endproc >> .LFE0: >> .size _Z3fooii,...
2020 Jan 23
3
How to find out the default CPU / Features String for a given triple?
...lxnum,+crc,-crypto,-custom-cheap-as-move,-cyclone,-disable-latency-sched-heuristic,+dit,+dotprod,-exynos-cheap-as-move,-exynosm1,-exynosm2,-exynosm3,-exynosm4,-falkor,+fmi,-force-32bit-jump-tables,+fp-armv8,-fp16fml,+fptoint,-fullfp16,-fuse-address,+fuse-aes,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,+jsconv,-kryo,+lor,+lse,-lsl-fast,+mpam,-mte,+neon,-no-neg-immediates,+nv,+pa,+pan,+pan-rwv,+perfmon,-predictable-select-expensive,+predres,-rand,+ras,+rasv8_4,+rcpc,+rcpc-immo,+rdm,-reserve-x1,-reserve-x10,-reserve-x11,-reserve-x12,-reserve-x13,-reserve-x14,-reserve-x15,-reserve-x18...
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
...        adrp    x6, globalvar         add     w5, w0, w0, lsr 31         ldr     w3, [x6,#:lo12:globalvar]                        <== hoist load of globalvar         mov     w2, 0         asr     w5, w5, 1 .L4:         cmp     w5, w2         add     w2, w2, w1         add     w4, w3, w1         csel    w3, w4, w3, hi         cmp     w2, w0         bcc     .L4         str     w3, [x6,#:lo12:globalvar]                       <== sink store of globalvar .L1:         ret         .cfi_endproc .LFE0:         .size   _Z3fooii, .-_Z3fooii         .ident  "GCC: (crosstool-NG linaro-1.13.1-4.8-20...
2014 Jun 27
3
[LLVMdev] Contributing the Apple ARM64 compiler backend
...rk. Most benchmarks are less than 5% behind GCC. > > Because of the licencing of SPEC, I have to be quite restricted in what I > say and I can't give any numbers - sorry about that. > > We are focussing on Cortex-A57, and the things we've identified so far are: > * The CSEL instruction behaves worse than the equivalent branch structure > in at least one benchmark. In an out of order core, select-like instructions > are going to be slower than their branched equivalent if the branch is > predictable due to CSEL having two dependencies. > > * Redundant...