after a long time i managed to make a progress with this problem. i can store and load fp16 as i16 in to some registers and do an add instruction. the problem now is that this messes up the real i16 (short, unsigned short). i have def FADD_H : NemaCorePseudo< (outs HGR16:$fd), (ins HGR16:$fs, HGR16:$ft), "add.h\t$fd, $fs, $ft", [(set (i16 HGR16:$fd),(i16 (f32_to_f16 (f32 (fadd (f32 (f16_to_f32 (i16 HGR16:$fs))), (f32 (f16_to_f32 (i16 HGR16:$ft))))))))]>; so i can have a half floating point add two half point variables and seems to work fine. -- View this message in context: http://llvm.1065342.n5.nabble.com/Half-Float-fp16-Native-Support-tp50665p54026.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
> def FADD_H : NemaCorePseudo< (outs HGR16:$fd), (ins HGR16:$fs, HGR16:$ft), > "add.h\t$fd, $fs, $ft", [(set (i16 HGR16:$fd),(i16 (f32_to_f16 (f32 (fadd > (f32 (f16_to_f32 (i16 HGR16:$fs))), > (f32 (f16_to_f32 (i16 HGR16:$ft))))))))]>; > > so i can have a half floating point add two half point variables and seems > to work fine.This does not look right. Note that you're matching f16_to_f32 intrinsics and friends. They are used for storage-only half FP stuff and you're trying to match them instead of native fp16. So, in short - you need to generate IR with proper fp16 arithmetics, not via storage-only wrappers. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
i understand that is not right but this was the only way not to use the fadd for f32 "add.s" and use the "add.h" what ever i tried llvm moved everything to the float registers and did add.s and not the half add.h is there any trick to do that? i tried a lot but with no luck -- View this message in context: http://llvm.1065342.n5.nabble.com/Half-Float-fp16-Native-Support-tp50665p54029.html Sent from the LLVM - Dev mailing list archive at Nabble.com.