Displaying 4 results from an estimated 4 matches for "rev64".
Did you mean:
reg64
2018 Apr 26
1
[Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load
...ld like to optimize at the IR level.
I'd like to turn an Arm/AArch64 table lookup intrinsic that takes a constant vector mask into a shufflevector instruction:
vtbl1(V,mask) ~> shufflevector(V,undef,mask)
The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend will generate rev64 instructions instead.
If the mask comes from a vld1 of a global constant I could fold it to allow the above instruction combining.
My question is, does the constant folding of the vld1 seem a good thing to do in the general case, as a standalone transformation, or only when used as a mask for a t...
2017 Nov 17
2
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
...// %entry
> sub sp, sp, #16 // =16
> str x0, [sp, #8]
> ldr x0, [sp, #8]
> ld1 { v0.2s }, [x0]
> add sp, sp, #16 // =16
> ret
>
> With global-isel off, there is a rev64 instruction between the ld1 and the add, which fixes up the endianness of the vector.
>
> Oliver
>
> From: Oliver Stannard
> Sent: 17 November 2017 13:32
> To: 'qcolombet at apple.com <mailto:qcolombet at apple.com>'
> Cc: llvm-dev at lists.llvm.org <mailt...
2017 Nov 27
2
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
...// %entry
> sub sp, sp, #16 // =16
> str x0, [sp, #8]
> ldr x0, [sp, #8]
> ld1 { v0.2s }, [x0]
> add sp, sp, #16 // =16
> ret
>
> With global-isel off, there is a rev64 instruction between the ld1 and the add, which fixes up the endianness of the vector.
>
> Oliver
>
> From: Oliver Stannard
> Sent: 17 November 2017 13:32
> To: 'qcolombet at apple.com <mailto:qcolombet at apple.com>'
> Cc: llvm-dev at lists.llvm.org <mailt...
2017 Nov 14
6
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
To give an update here, we actually are not missing a mapping. The code complains because we are copying around a fp16 into a gpr32 and that shouldn’t be done with a copy (default mapping).
I extended the repairing code to issue G_ANYEXT in those cases instead of asserting.
However, now, I have to teach instruction select about those ANYEXT otherwise we’ll fallback in that case. But that’s a