> i understand that is not right but this was the only way not to use the
fadd
> for f32 "add.s" and use the "add.h" what ever i tried
llvm moved everything
> to the float registers and did add.s and not the half add.h
It seems you do not understand the issue.
Half floating poing operations can be done in two ways:
1. Storage-only (fp16 is used to store value, all the operations are
performed on floats). For such f32 <-> f16 conversion the special
intrinsic is used (which is lowered to native instruction on ARM NEON
for example)
2. Native fp16.
Note that for both mode *frontend* is involved, because in case of 1.
it should generate appropriate conversion when necessary.
It seems that you have IR from 1. case, but you really want to do
stuff in mode 2. So, generate proper IR (with native fp16 operations,
not storage only stuff) and almost all your problems will go away.
--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University