thr3ads.net - search: "t2ldrhi12"

[RFC] Half-Precision Support in the Arm Backends

2018 Jan 18

0

[RFC] Half-Precision Support in the Arm Backends

...nd yes, to make it even funnier, this node has an i32 operand, and that's because we do the half-float load with an integer load instruction. And after this rewrite, we end up with this DAG: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t16: i32,ch = t2LDRHi12<Mem:LD2[%addr]> t2, TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 %noreg, t0 t20: f16 = COPY_TO_REGCLASS t16, TargetConstant:i32<1> <~~~~~~~~~~~~~ PROBLEM HERE t12: f32 = VCVTBHS t20, TargetConstant:i32<14>, Register:i32 %noreg t7: i3...

[RFC] Half-Precision Support in the Arm Backends

2017 Dec 06

2

[RFC] Half-Precision Support in the Arm Backends

Thanks a lot for the suggestions! I will look into using vld1/vst1, sounds good. I am custom lowering the bitcasts, that's now the only place where FP_TO_FP16 and FP16_TO_FP nodes are created to avoid inefficient code generation. I will double check if I can't achieve the same without using these nodes (because I really would like to get completely rid of them). Cheers, Sjoerd.

[RFC] Half-Precision Support in the Arm Backends

2018 Jan 18

1

[RFC] Half-Precision Support in the Arm Backends

...nd yes, to make it even funnier, this node has an i32 operand, and that's because we do the half-float load with an integer load instruction. And after this rewrite, we end up with this DAG: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t16: i32,ch = t2LDRHi12<Mem:LD2[%addr]> t2, TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 %noreg, t0 t20: f16 = COPY_TO_REGCLASS t16, TargetConstant:i32<1> <~~~~~~~~~~~~~ PROBLEM HERE t12: f32 = VCVTBHS t20, TargetConstant:i32<14>, Register:i32 %noreg t7: i3...

search for: t2ldrhi12