Morten Ofstad
2005-Mar-16 13:17 UTC
[LLVMdev] Floating point compare instruction selection
Hello, I didn't get any reply to my previous mail about adding floating point intrinsics to the X86 pattern instruction selector... And I could really need some help. Anyway, I think my confusion was caused partly by an already existing bug in the instruction selection for floating point compares. The case which emits code for the special case of comparing against constant 0.0 does not return after generating it's code, so the normal compare is also generated! As far as I can tell it should return right after this: BuildMI(BB, X86::SAHF, 1); instead it falls through and goes on to generate the normal compare instruction... Am I right? m.
Chris Lattner
2005-Mar-16 16:24 UTC
[LLVMdev] Floating point compare instruction selection
On Wed, 16 Mar 2005, Morten Ofstad wrote:> Hello, > > I didn't get any reply to my previous mail about adding floating point > intrinsics to the X86 pattern instruction selector... And I could really need > some help.Sorry about that, it slipped through the cracks. :(> Anyway, I think my confusion was caused partly by an already > existing bug in the instruction selection for floating point compares.Okay, please note that the pattern isel for X86 is not 100% stable or finished. It does work in many cases however, but it probably still has some bugs. You can track the status of the X86 pattern isel by comparing the LLC column to the LLC-beta column here: http://llvm.cs.uiuc.edu/testresults/X86/ The pattern selector currently has some code-quality issues compared to the simple isel in some cases (due to ugly hacks in the simple isel that have not yet been replicated). When I get a chance to work on finishing it up, it will become the default X86 isel.> The case which emits code for the special case of comparing against constant > 0.0 does not return after generating it's code, so the normal compare is also > generated! As far as I can tell it should return right after this: > > BuildMI(BB, X86::SAHF, 1); > > instead it falls through and goes on to generate the normal compare > instruction... Am I right?Nope. It's emitting a compare against zero with fucomi instead of ftst. fucomi is a PPRO+ instruction that deposits the result of the comparison into the integer condition codes. This saves having to use SAHF and some bit-twiddling, so it's usually a bit faster than using ftst. The simple isel used to produce ftst for compare against zero. If you do some benchmarking and find that one is noticably faster than the other, we should switch them both to use the same code sequence. -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
Morten Ofstad
2005-Mar-17 09:22 UTC
[LLVMdev] Floating point compare instruction selection
Chris Lattner wrote:> On Wed, 16 Mar 2005, Morten Ofstad wrote: >> The case which emits code for the special case of comparing against >> constant 0.0 does not return after generating it's code, so the normal >> compare is also generated! As far as I can tell it should return right >> after this: >> >> BuildMI(BB, X86::SAHF, 1); >> >> instead it falls through and goes on to generate the normal compare >> instruction... Am I right? > > Nope. It's emitting a compare against zero with fucomi instead of ftst. > fucomi is a PPRO+ instruction that deposits the result of the comparison > into the integer condition codes. This saves having to use SAHF and some > bit-twiddling, so it's usually a bit faster than using ftst. The simple > isel used to produce ftst for compare against zero. If you do some > benchmarking and find that one is noticably faster than the other, we > should switch them both to use the same code sequence.It's generating _both_ the SAHF and the fucomi -- look at the code ISelPattern generates: 17160443 call HueVMReadCommands_LLVMReadVoxel (19BB229h) 17160448 fsub dword ptr ds:[161D6280h] 1716044E fabs 17160450 fst qword ptr [esp+14h] 17160454 ftst 17160456 fstp st(0) 17160458 fnstsw ax 1716045A sahf 1716045B fldz 1716045D fchs 1716045F fld qword ptr [esp+14h] 17160463 fucomip st,st(1) 17160465 fstp st(0) 17160467 jbe 17160498 the ISelSimple generates: 1716047F call eax 17160481 fsub dword ptr ds:[16237240h] 17160487 fabs 17160489 fst qword ptr [esp+14h] 1716048D ftst 1716048F fstp st(0) 17160491 fnstsw ax 17160493 sahf 17160494 jbe 171604C7 m.