Displaying 8 results from an estimated 8 matches for "csel".
Did you mean:
cseg
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
...mark. Most benchmarks are less than 5% behind GCC.
>
> Because of the licencing of SPEC, I have to be quite restricted in what I
> say and I can't give any numbers - sorry about that.
>
> We are focussing on Cortex-A57, and the things we've identified so far are:
> * The CSEL instruction behaves worse than the equivalent branch structure
> in at least one benchmark. In an out of order core, select-like
> instructions
> are going to be slower than their branched equivalent if the branch is
> predictable due to CSEL having two dependencies.
>
> * Redun...
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
..., and 25%
ahead on one benchmark. Most benchmarks are less than 5% behind GCC.
Because of the licencing of SPEC, I have to be quite restricted in what I
say and I can't give any numbers - sorry about that.
We are focussing on Cortex-A57, and the things we've identified so far are:
* The CSEL instruction behaves worse than the equivalent branch structure
in at least one benchmark. In an out of order core, select-like instructions
are going to be slower than their branched equivalent if the branch is
predictable due to CSEL having two dependencies.
* Redundant calculations inside if c...
2014 Jun 24
5
[LLVMdev] Contributing the Apple ARM64 compiler backend
Eric Christopher <echristo <at> gmail.com> writes:
>
> > The big pain issues I see merging from ARM64 to AArch64 are:
> > 1. Apple have created a fairly complete scheduling model already
for
> > ARM64, and we'd have to merge the partial? model in AArch64 and theirs.
We
> > risk regressing performance on Apple's targets here, and we can't
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
...cbz w0, .L1
adrp x6, globalvar
add w5, w0, w0, lsr 31
ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar
mov w2, 0
asr w5, w5, 1
.L4:
cmp w5, w2
add w2, w2, w1
add w4, w3, w1
csel w3, w4, w3, hi
cmp w2, w0
bcc .L4
str w3, [x6,#:lo12:globalvar] <== sink store of globalvar
.L1:
ret
.cfi_endproc
.LFE0:
.size _Z3fooii, .-_Z3fooii
.ident "GCC: (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2...
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
...t; ldr w3, [x6,#:lo12:globalvar] <== hoist load of globalvar
>> mov w2, 0
>> asr w5, w5, 1
>> .L4:
>> cmp w5, w2
>> add w2, w2, w1
>> add w4, w3, w1
>> csel w3, w4, w3, hi
>> cmp w2, w0
>> bcc .L4
>> str w3, [x6,#:lo12:globalvar] <== sink store of globalvar
>> .L1:
>> ret
>> .cfi_endproc
>> .LFE0:
>> .size _Z3fooii,...
2020 Jan 23
3
How to find out the default CPU / Features String for a given triple?
...lxnum,+crc,-crypto,-custom-cheap-as-move,-cyclone,-disable-latency-sched-heuristic,+dit,+dotprod,-exynos-cheap-as-move,-exynosm1,-exynosm2,-exynosm3,-exynosm4,-falkor,+fmi,-force-32bit-jump-tables,+fp-armv8,-fp16fml,+fptoint,-fullfp16,-fuse-address,+fuse-aes,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,+jsconv,-kryo,+lor,+lse,-lsl-fast,+mpam,-mte,+neon,-no-neg-immediates,+nv,+pa,+pan,+pan-rwv,+perfmon,-predictable-select-expensive,+predres,-rand,+ras,+rasv8_4,+rcpc,+rcpc-immo,+rdm,-reserve-x1,-reserve-x10,-reserve-x11,-reserve-x12,-reserve-x13,-reserve-x14,-reserve-x15,-reserve-x18...
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
... adrp x6, globalvar
add w5, w0, w0, lsr 31
ldr w3, [x6,#:lo12:globalvar] <== hoist
load of globalvar
mov w2, 0
asr w5, w5, 1
.L4:
cmp w5, w2
add w2, w2, w1
add w4, w3, w1
csel w3, w4, w3, hi
cmp w2, w0
bcc .L4
str w3, [x6,#:lo12:globalvar] <== sink
store of globalvar
.L1:
ret
.cfi_endproc
.LFE0:
.size _Z3fooii, .-_Z3fooii
.ident "GCC: (crosstool-NG linaro-1.13.1-4.8-20...
2014 Jun 27
3
[LLVMdev] Contributing the Apple ARM64 compiler backend
...rk. Most benchmarks are less than 5% behind GCC.
>
> Because of the licencing of SPEC, I have to be quite restricted in what I
> say and I can't give any numbers - sorry about that.
>
> We are focussing on Cortex-A57, and the things we've identified so far are:
> * The CSEL instruction behaves worse than the equivalent branch structure
> in at least one benchmark. In an out of order core, select-like instructions
> are going to be slower than their branched equivalent if the branch is
> predictable due to CSEL having two dependencies.
>
> * Redundant...