Hello, I am trying to make the FP intrinsics (abs, sin, cos, sqrt) I've added work with the X86ISelPattern, but I'm having some difficulties understanding what needs to be done. I assume I have to add new nodetypes for the FP instructions to SelectionDAGNodes.h, and make nodes for these in SelectionDAGLowering::visitCall when I find the intrinsic... The part I don't quite understand is what to do for targets that don't have these instructions (although I'm only interested in X86 myself, I would like to see these patches in the official LLVM version as it's some work to maintain them) -- for me it would make most sense to lower the intrinsic to a call if it's not supported. However I notice that for other intrinsics (memcpy etc.) this is done in LegalizeDAG where the node is expanded to a call if it's not directly supported for the target. To illustrate what I have been doing so far I attach the diff to this mail -- if someone could have a look at it and tell me what needs to be done to get it working I would be very grateful... (Note that if you use the X86ISelSimple it's already working...) m. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: temp.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20050311/ee9c30d4/attachment.ksh>
Update: I have been working on this all day, and I finally got it working more or less with the pattern instruction selector... However, the generated code is not very good, and I haven't implemented the expand to calls if the target does not support these FP instructions. As an example, in the following function the sub abs and compare compiles to 13 instructions! Also it has changed the result of the abs to be a 64 bit double when it stores it, while my intention was to get a 32 bit float. For comparision I included the code generated by the X86ISelSimple which is only 8 instructions... it seems the compare is being generated twice in two different ways with the pattern selector?! I also attached the diff of my current version m. internal void %EvaluatePoint3D65() { EntryBlock: store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 0) store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 1) store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 2) store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 3) %Value = call float %ReadVoxel( void* %_hVB59 ) ; <float> [#uses=1] %arg2 = load float* %T64 ; <float> [#uses=1] %VMCommandSubtract = sub float %Value, %arg2 ; <float> [#uses=1] %VMCommandAbs = call float %llvm.abs( float %VMCommandSubtract ) ; <float> [#uses=2] %isNonZero = setgt float %VMCommandAbs, 0.000000e+000 ; <bool> [#uses=1] br bool %isNonZero, label %NonZero, label %Zero NonZero: ; preds = %EntryBlock call void %Shader1DLookupLinear( <4 x float>* %_ARGB56, void* %_hS60, float %VMCommandAbs, void* %_hContext3D58 ) ret void Zero: ; preds = %EntryBlock store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 0) store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 1) store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 2) store float 0.000000e+000, float* getelementptr ([4 x float]* cast (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 3) ret void } Generated with ISelPattern: 17160410 sub esp,1Ch 17160413 mov dword ptr ds:[161D6240h],0 1716041D mov dword ptr ds:[161D6244h],0 17160427 mov dword ptr ds:[161D6248h],0 17160431 mov dword ptr ds:[161D624Ch],0 1716043B mov eax,76E4560h 17160440 mov dword ptr [esp],eax 17160443 call HueVMReadCommands_LLVMReadVoxel (19BB229h) 17160448 fsub dword ptr ds:[161D6280h] 1716044E fabs 17160450 fst qword ptr [esp+14h] 17160454 ftst 17160456 fstp st(0) 17160458 fnstsw ax 1716045A sahf 1716045B fldz 1716045D fchs 1716045F fld qword ptr [esp+14h] 17160463 fucomip st,st(1) 17160465 fstp st(0) 17160467 jbe 17160498 1716046D mov eax,76E4F60h 17160472 mov dword ptr [esp+0Ch],eax 17160476 fld qword ptr [esp+14h] 1716047A fstp dword ptr [esp+8] 1716047E mov eax,15900060h 17160483 mov dword ptr [esp+4],eax 17160487 mov eax,161D6240h 1716048C mov dword ptr [esp],eax 1716048F call HueVMShaderCommands_LLVMShader1DLookupLinear (19D8B76h) 17160494 add esp,1Ch 17160497 ret 17160498 mov dword ptr ds:[161D6240h],0 171604A2 mov dword ptr ds:[161D6244h],0 171604AC mov dword ptr ds:[161D6248h],0 171604B6 mov dword ptr ds:[161D624Ch],0 171604C0 add esp,1Ch 171604C3 ret Generated with ISelSimple: 17160440 sub esp,1Ch 17160443 mov eax,16237200h 17160448 mov dword ptr [eax],0 1716044E mov eax,16237200h 17160453 mov dword ptr [eax+4],0 1716045A mov eax,16237200h 1716045F mov dword ptr [eax+8],0 17160466 mov eax,16237200h 1716046B mov dword ptr [eax+0Ch],0 17160472 mov eax,19BB229h 17160477 mov ecx,76E4560h 1716047C mov dword ptr [esp],ecx 1716047F call eax 17160481 fsub dword ptr ds:[16237240h] 17160487 fabs 17160489 fst qword ptr [esp+14h] 1716048D ftst 1716048F fstp st(0) 17160491 fnstsw ax 17160493 sahf 17160494 jbe 171604C7 1716049A mov eax,19D8B76h 1716049F mov ecx,16237200h 171604A4 mov dword ptr [esp],ecx 171604A7 mov ecx,15900060h 171604AC mov dword ptr [esp+4],ecx 171604B0 fld qword ptr [esp+14h] 171604B4 fstp dword ptr [esp+8] 171604B8 mov ecx,76E4F60h 171604BD mov dword ptr [esp+0Ch],ecx 171604C1 call eax 171604C3 add esp,1Ch 171604C6 ret 171604C7 mov eax,16237200h 171604CC mov dword ptr [eax],0 171604D2 mov eax,16237200h 171604D7 mov dword ptr [eax+4],0 171604DE mov eax,16237200h 171604E3 mov dword ptr [eax+8],0 171604EA mov eax,16237200h 171604EF mov dword ptr [eax+0Ch],0 171604F6 add esp,1Ch 171604F9 ret -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: temp.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20050311/82268a2c/attachment.ksh>
On Fri, 11 Mar 2005, Morten Ofstad wrote:> Hello, > > I am trying to make the FP intrinsics (abs, sin, cos, sqrt) I've added work > with the X86ISelPattern, but I'm having some difficulties understanding what > needs to be done.Cool. Here are a couple of requests: 1. I don't think we need an "llvm.abs" intrinsic at the llvm level. This can be modeled as a pair of setcc/select instructions. 2. My objection to llvm.abs does not apply to the FP_ABS node you added: it is fine. Additionally, the target-independent selection dag machinery should be the one that notices the relevant setcc/select pairs at the llvm level to fabricate the FP_ABS node. 3. On X86 at least, sin and cos are not defined over the full numeric range. These instructions are useful for applications like yours, and situations where a flag like "-ffast-math" has been provided. Because of this, please name the intrinsics and nodes sin_approx and cos_approx. I don't think that sqrt on the X86 has this limitation, so its intrinsic can be named just "llvm.sqrt". 4. Don't forget a doc patch to docs/LangRef.html :-)> I assume I have to add new nodetypes for the FP > instructions to SelectionDAGNodes.h, and make nodes for these in > SelectionDAGLowering::visitCall when I find the intrinsic...This looks good.> The part I don't quite understand is what to do for targets that don't have > these instructions (although I'm only interested in X86 myself, I would like > to see these patches in the official LLVM version as it's some work to > maintain them)Ok, sounds good.> -- for me it would make most sense to lower the intrinsic to a > call if it's not supported. However I notice that for other intrinsics > (memcpy etc.) this is done in LegalizeDAG where the node is expanded to a > call if it's not directly supported for the target.Yup, this is what we want to do. There are two places to implement this: LegalizeDAG for the SelectionDAG isels, and lib/CodeGen/IntrinsicLowering.cpp for other isels.> To illustrate what I have been doing so far I attach the diff to this mail -- > if someone could have a look at it and tell me what needs to be done to get > it working I would be very grateful... (Note that if you use the > X86ISelSimple it's already working...)One of the problems that jumps out at me in the pattern version is that you have code that looks like this: + case Intrinsic::sqrt: + setValue(&I, DAG.getNode(ISD::FP_SQRT, MVT::f64, + getValue(I.getOperand(1)))); In your case, you're passing in an float, which is 32-bits. You probably want the type argument for this to be whatever the input type is, not hard coded to f64. -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
On Fri, 11 Mar 2005, Morten Ofstad wrote:> Update: I have been working on this all day, and I finally got it working > more or less with the pattern instruction selector... However, the generated > code is not very good, and I haven't implemented the expand to calls if the > target does not support these FP instructions.Oh, I see you fixed the F64 thing, sorry, disregard that part of the previous email :-/> As an example, in the following function the sub abs and compare compiles to > 13 instructions! Also it has changed the result of the abs to be a 64 bit > double when it stores it, while my intention was to get a 32 bit float. > For comparision I included the code generated by the X86ISelSimple which is > only 8 instructions... it seems the compare is being generated twice in two > different ways with the pattern selector?!I'm not sure if the specific problem you're talking about here is the comparison-against-zero issue, or something else. Can you reduce it to a smaller test case or annotate the source with the problem you're hitting? Thanks, -Chris> I also attached the diff of my current version > > m. > > internal void %EvaluatePoint3D65() { > EntryBlock: > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 0) > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 1) > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 2) > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 3) > %Value = call float %ReadVoxel( void* %_hVB59 ) ; <float> > [#uses=1] > %arg2 = load float* %T64 ; <float> [#uses=1] > %VMCommandSubtract = sub float %Value, %arg2 ; <float> > [#uses=1] > %VMCommandAbs = call float %llvm.abs( float %VMCommandSubtract ) > ; <float> [#uses=2] > %isNonZero = setgt float %VMCommandAbs, 0.000000e+000 ; > <bool> [#uses=1] > br bool %isNonZero, label %NonZero, label %Zero > > NonZero: ; preds = %EntryBlock > call void %Shader1DLookupLinear( <4 x float>* %_ARGB56, void* %_hS60, > float %VMCommandAbs, void* %_hContext3D58 ) > ret void > > Zero: ; preds = %EntryBlock > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 0) > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 1) > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 2) > store float 0.000000e+000, float* getelementptr ([4 x float]* cast > (<4 x float>* %_ARGB56 to [4 x float]*), int 0, int 3) > ret void > } > > Generated with ISelPattern: > > 17160410 sub esp,1Ch > 17160413 mov dword ptr ds:[161D6240h],0 > 1716041D mov dword ptr ds:[161D6244h],0 > 17160427 mov dword ptr ds:[161D6248h],0 > 17160431 mov dword ptr ds:[161D624Ch],0 > 1716043B mov eax,76E4560h > 17160440 mov dword ptr [esp],eax > 17160443 call HueVMReadCommands_LLVMReadVoxel (19BB229h) > 17160448 fsub dword ptr ds:[161D6280h] > 1716044E fabs > 17160450 fst qword ptr [esp+14h] > 17160454 ftst > 17160456 fstp st(0) > 17160458 fnstsw ax > 1716045A sahf > 1716045B fldz > 1716045D fchs > 1716045F fld qword ptr [esp+14h] > 17160463 fucomip st,st(1) > 17160465 fstp st(0) > 17160467 jbe 17160498 > 1716046D mov eax,76E4F60h > 17160472 mov dword ptr [esp+0Ch],eax > 17160476 fld qword ptr [esp+14h] > 1716047A fstp dword ptr [esp+8] > 1716047E mov eax,15900060h > 17160483 mov dword ptr [esp+4],eax > 17160487 mov eax,161D6240h > 1716048C mov dword ptr [esp],eax > 1716048F call HueVMShaderCommands_LLVMShader1DLookupLinear (19D8B76h) > 17160494 add esp,1Ch > 17160497 ret > 17160498 mov dword ptr ds:[161D6240h],0 > 171604A2 mov dword ptr ds:[161D6244h],0 > 171604AC mov dword ptr ds:[161D6248h],0 > 171604B6 mov dword ptr ds:[161D624Ch],0 > 171604C0 add esp,1Ch > 171604C3 ret > > Generated with ISelSimple: > > 17160440 sub esp,1Ch > 17160443 mov eax,16237200h > 17160448 mov dword ptr [eax],0 > 1716044E mov eax,16237200h > 17160453 mov dword ptr [eax+4],0 > 1716045A mov eax,16237200h > 1716045F mov dword ptr [eax+8],0 > 17160466 mov eax,16237200h > 1716046B mov dword ptr [eax+0Ch],0 > 17160472 mov eax,19BB229h > 17160477 mov ecx,76E4560h > 1716047C mov dword ptr [esp],ecx > 1716047F call eax > 17160481 fsub dword ptr ds:[16237240h] > 17160487 fabs > 17160489 fst qword ptr [esp+14h] > 1716048D ftst > 1716048F fstp st(0) > 17160491 fnstsw ax > 17160493 sahf > 17160494 jbe 171604C7 > 1716049A mov eax,19D8B76h > 1716049F mov ecx,16237200h > 171604A4 mov dword ptr [esp],ecx > 171604A7 mov ecx,15900060h > 171604AC mov dword ptr [esp+4],ecx > 171604B0 fld qword ptr [esp+14h] > 171604B4 fstp dword ptr [esp+8] > 171604B8 mov ecx,76E4F60h > 171604BD mov dword ptr [esp+0Ch],ecx > 171604C1 call eax > 171604C3 add esp,1Ch > 171604C6 ret > 171604C7 mov eax,16237200h > 171604CC mov dword ptr [eax],0 > 171604D2 mov eax,16237200h > 171604D7 mov dword ptr [eax+4],0 > 171604DE mov eax,16237200h > 171604E3 mov dword ptr [eax+8],0 > 171604EA mov eax,16237200h > 171604EF mov dword ptr [eax+0Ch],0 > 171604F6 add esp,1Ch > 171604F9 ret >-Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
Chris Lattner wrote:> On Fri, 11 Mar 2005, Morten Ofstad wrote: >> I am trying to make the FP intrinsics (abs, sin, cos, sqrt) I've added >> work with the X86ISelPattern, but I'm having some difficulties >> understanding what needs to be done. > > Cool. Here are a couple of requests: > > 1. I don't think we need an "llvm.abs" intrinsic at the llvm level. This > can be modeled as a pair of setcc/select instructions.OK, I'm not exactly sure where and how to do this matching of setcc/select - can you give me some pointers?> 2. My objection to llvm.abs does not apply to the FP_ABS > node you added: it is fine. Additionally, the target-independent > selection dag machinery should be the one that notices the relevant > setcc/select pairs at the llvm level to fabricate the FP_ABS node.OK> 3. On X86 at least, sin and cos are not defined over the full numeric > range. These instructions are useful for applications like yours, and > situations where a flag like "-ffast-math" has been provided. Because > of this, please name the intrinsics and nodes sin_approx and > cos_approx. I don't think that sqrt on the X86 has this limitation, > so its intrinsic can be named just "llvm.sqrt".I think it makes more sense to have the intrinsics as is, but do the code generation in the X86 target different depending on some command line flag. For the pattern ISel, this simply amounts to registering the FP_SIN and FP_COS as nodes which need to be expanded (to calls) if not fast-math is set... What do you think about this approach?> 4. Don't forget a doc patch to docs/LangRef.html :-) > >> I assume I have to add new nodetypes for the FP instructions to >> SelectionDAGNodes.h, and make nodes for these in >> SelectionDAGLowering::visitCall when I find the intrinsic...OK, I will.>> -- for me it would make most sense to lower the intrinsic to a call if >> it's not supported. However I notice that for other intrinsics (memcpy >> etc.) this is done in LegalizeDAG where the node is expanded to a call >> if it's not directly supported for the target. > > Yup, this is what we want to do. There are two places to implement > this: LegalizeDAG for the SelectionDAG isels, and > lib/CodeGen/IntrinsicLowering.cpp for other isels.Why not do it in SelectionDAGLowering::visitCall? The way I have implemented it now, this calls TLI.hasNativeSupportForOperation to see if it (for example FP_SQRT) is a legal operation on the target, and if not it sets RenameFn to "sin" and simply goes ahead with generating the call. This is a lot less code than doing it in LegalizeDAG, and also more efficient since it's not first lowered to an instruction and then expanded to a call. m.
On Wed, 2005-03-16 at 10:38, Chris Lattner wrote:> On Fri, 11 Mar 2005, Morten Ofstad wrote: > > I am trying to make the FP intrinsics (abs, sin, cos, sqrt) I've added work > > with the X86ISelPattern, but I'm having some difficulties understanding what > > needs to be done.I have some notes for Alpha below, if you are so inclined :)> 1. I don't think we need an "llvm.abs" intrinsic at the llvm level. This > can be modeled as a pair of setcc/select instructions.I agree.> 2. My objection to llvm.abs does not apply to the FP_ABS > node you added: it is fine. Additionally, the target-independent > selection dag machinery should be the one that notices the relevant > setcc/select pairs at the llvm level to fabricate the FP_ABS node.A FP_ABS node is nice, alpha has abs (cyps F31, src, dst)> 3. On X86 at least, sin and cos are not defined over the full numeric > range. These instructions are useful for applications like yours, and > situations where a flag like "-ffast-math" has been provided. Because > of this, please name the intrinsics and nodes sin_approx and > cos_approx. I don't think that sqrt on the X86 has this limitation, > so its intrinsic can be named just "llvm.sqrt".I think it is the same with alpha's sqrt. Though sin and cos will have to be lowered.> > The part I don't quite understand is what to do for targets that don't have > > these instructions (although I'm only interested in X86 myself, I would like > > to see these patches in the official LLVM version as it's some work to > > maintain them)So alpha has sqrt and abs, but not sin and cos. Once you have the SelectionDag making the nodes for the intrinsics, let me know. Andrew