Displaying 5 results from an estimated 5 matches for "f16_to_fp".
Did you mean:
fp16_to_fp
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
Thanks Eli.
I forgot to bring up the strict FP questions which I was working on when I
found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp
emitted by promoting load/store/bitcast need to be strict versions of
fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for
the bitcast case which isn't a chained node.
~Craig
On Tue, Dec 10, 2019 at 3:18 PM Eli Friedman <efriedma at quicinc.com> wrote:
> We cou...
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
...te. So we lost the intermediate rounding between the 2
adds that was in the original clang IR.
I believe this occurs because the TypePromoteFloat legalization converts
all arithmetic operations to their f32 equivalents, but does not place
conversions to/from half around them. Instead fp_to_f16 and f16_to_fp nodes
are only generated at loads, stores, bitcasts, and a probably a few other
places. Basically only the place where the 16-bit size is needed to make
the operation possible. Basically what we have is a very similar
implementation to promoting integers, but that doesn't work for FP because
we...
2018 Jan 18
1
[RFC] Half-Precision Support in the Arm Backends
...so:
def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm",
[]>,
Requires<[HasFP16]>,
Sched<[WriteFPCVT]>;
def : FP16Pat<(f16_to_fp GPR:$a),
(VCVTBHS (COPY_TO_REGCLASS GPR:$a, SPR))>;
def : FullFP16Pat<(f32 (fpextend HPR:$Sm)),
(VCVTBHS (COPY_TO_REGLASS HPR:$Sm, SPR)>;
I'm not sure of the COPY_TO_REGLASS semantics, but I would (dangerously) assume that it when it comes to cop...
2018 Jan 18
0
[RFC] Half-Precision Support in the Arm Backends
...0, s0
vmov r0, s0
bx lr
when we don't have the Armv8.2-A FP16 instructions available, and thus only
have the conversion instructions.
The problem is in the conversion rules, some rewrite rules to be more specific,
and I think this is one of the culprits:
def : Pat<(f16_to_fp GPR:$a),
(VCVTBHS (COPY_TO_REGCLASS GPR:$a, SPR))>;
This rewrite rule is supposed to first move the GPR reg in to a S-registers:
vmov s0, r0
and then to the conversion:
vcvtb.f32.f16 s0, s0
This rewrite rule gets triggered because the ISEL DAG has indeed this funny
no...
2017 Dec 06
2
[RFC] Half-Precision Support in the Arm Backends
Thanks a lot for the suggestions! I will look into using vld1/vst1, sounds good.
I am custom lowering the bitcasts, that's now the only place where FP_TO_FP16
and FP16_TO_FP nodes are created to avoid inefficient code generation. I will
double check if I can't achieve the same without using these nodes (because I
really would like to get completely rid of them).
Cheers,
Sjoerd.