Cyril Six via llvm-dev
2022-Jan-05 11:44 UTC
[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers
Hello, In our Kalray LLVM backend, we have builtins to get and set system registers. One of them is $CS, which has sticky bits enforcing rounding mode or storing masked floating-point exceptions. The equivalent on AArch64 would be FPCR. In our user code, we would like to preserve the partial ordering between a SET to $CS and a floating-point operation, since the SET to $CS might be modifying the rounding mode. Similarly, we would like to preserve the partial ordering between a GET from $CS and a floating-point operation, since a user code might want to examine the floating-point exception bits right after a given floating-point operation. Another use-case we have is the following: we have a coprocessor that is turned on by setting a given bit on a system register. This can be accessed by a builtin. Such SET instruction must happen before using a coprocessor instruction - the compiler should not break that dependency when reordering instructions. We have tried to implement this by using implicit Defs and implicit Uses in our instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]` where relevant in our Target Description files. I have been running some experiments, examining the scheduling outputs and the dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and a child of VLIWPacketizerList for bundling). I have found that the implicit defs and uses are indeed taken into account by the post-RA schedulers. However, they seem to be ignored by the pre-RA schedulers. Also, they do not appear as dependencies in the SelectionDAG. If I look at what some other backends did, AArch64 does not seem to model anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary) to prevent floating-point instructions being ordered above it - but isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA schedulers do not seem to care about that. The bad consequence for us: our programmers have to encapsulate the SET instructions (touching system registers) in non-inlined functions to enforce the compiler not breaking anything. We are looking for advice on how to treat this problem - we have possible leads, like modifying the SelectionDAG to recover these dependencies, or modifying the schedulers to scan the SelectionDAG and enforce the source order when such dependency is detected (maybe by having a look at how SourceScheduler works), but we have not yet investigated it fully. Any such advice would be greatly appreciated Also, another related issue: it would seem that the flag -ffp-exception-behavior=strict does not preserve the exception semantics like it says it does. Although the generated IR seems to preserve it, there does not seem to be anything in the LLVM backends enforcing the "strict" floating-point exception behavior. That last point can be witnessed in that piece of code: https://godbolt.org/z/e96zP7jET ``` long fpcr; int toto(float a, float b, float c, double d, double e){ float bc = b + c; // first faddd asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :); float abc = a + bc; // second faddd float dw = (float) d; // fwidenlwd : should not happen before the second faddd float ew = (float) e; int dw_ewl = (int) dw + (int) ew; int abcl_dw_ewl = (int) abc + dw_ewl; return abcl_dw_ewl; } ``` Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following assembly code: ``` toto: fadd s1, s1, s2 fcvt s2, d3 fadd s0, s1, s0 fcvt s3, d4 fcvtzs w9, s2 fcvtzs w10, s0 add w9, w10, w9 fcvtzs w10, s3 add w0, w9, w10 adrp x9, fpcr //APP mrs x8, FPCR //NO_APP str x8, [x9, :lo12:fpcr] ret ``` Notice that mrs was moved below - which does not seem to preserve the floating-point exception semantics of the compiled code. PS : apologies for the double message if any ; I sent the first to llvm-dev-bounces by mistake Best regards, Cyril Six Compiler Engineer • Kalray Phone: csix at kalrayinc.com • [ https://www.kalrayinc.com/ | www.kalrayinc.com ] [ https://www.kalrayinc.com/ | ] Please consider the environment before printing this e-mail. This message contains information that may be privileged or confidential and is the property of Kalray S.A. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/6e03b758/attachment-0001.html>
Wang, Phoebe via llvm-dev
2022-Jan-05 14:44 UTC
[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers
Did you try `hasSideEffects = 1`? I’m not familiar with AArch64. On X86, we have separate FPCR and FPSR. The former is used for control (rounding, exception mask) and the latter is for status. We modeled all FP instructions that may raise exception by `mayRaiseFPException = 1` and using FPCR. Note, the read of FPCR instruction is another use instead of def FPCR. So it’s not necessary to keep the order of read instruction ahead as source order. Only the write FPCR does. I guess it is the same reason for AArch64? Maybe you can have a check on the write of FPCR. Thanks Phoebe From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Cyril Six via llvm-dev Sent: Wednesday, January 5, 2022 7:44 PM To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers Hello, In our Kalray LLVM backend, we have builtins to get and set system registers. One of them is $CS, which has sticky bits enforcing rounding mode or storing masked floating-point exceptions. The equivalent on AArch64 would be FPCR. In our user code, we would like to preserve the partial ordering between a SET to $CS and a floating-point operation, since the SET to $CS might be modifying the rounding mode. Similarly, we would like to preserve the partial ordering between a GET from $CS and a floating-point operation, since a user code might want to examine the floating-point exception bits right after a given floating-point operation. Another use-case we have is the following: we have a coprocessor that is turned on by setting a given bit on a system register. This can be accessed by a builtin. Such SET instruction must happen before using a coprocessor instruction - the compiler should not break that dependency when reordering instructions. We have tried to implement this by using implicit Defs and implicit Uses in our instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]` where relevant in our Target Description files. I have been running some experiments, examining the scheduling outputs and the dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and a child of VLIWPacketizerList for bundling). I have found that the implicit defs and uses are indeed taken into account by the post-RA schedulers. However, they seem to be ignored by the pre-RA schedulers. Also, they do not appear as dependencies in the SelectionDAG. If I look at what some other backends did, AArch64 does not seem to model anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary) to prevent floating-point instructions being ordered above it - but isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA schedulers do not seem to care about that. The bad consequence for us: our programmers have to encapsulate the SET instructions (touching system registers) in non-inlined functions to enforce the compiler not breaking anything. We are looking for advice on how to treat this problem - we have possible leads, like modifying the SelectionDAG to recover these dependencies, or modifying the schedulers to scan the SelectionDAG and enforce the source order when such dependency is detected (maybe by having a look at how SourceScheduler works), but we have not yet investigated it fully. Any such advice would be greatly appreciated Also, another related issue: it would seem that the flag -ffp-exception-behavior=strict does not preserve the exception semantics like it says it does. Although the generated IR seems to preserve it, there does not seem to be anything in the LLVM backends enforcing the "strict" floating-point exception behavior. That last point can be witnessed in that piece of code: https://godbolt.org/z/e96zP7jET ``` long fpcr; int toto(float a, float b, float c, double d, double e){ float bc = b + c; // first faddd asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :); float abc = a + bc; // second faddd float dw = (float) d; // fwidenlwd : should not happen before the second faddd float ew = (float) e; int dw_ewl = (int) dw + (int) ew; int abcl_dw_ewl = (int) abc + dw_ewl; return abcl_dw_ewl; } ``` Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following assembly code: ``` toto: fadd s1, s1, s2 fcvt s2, d3 fadd s0, s1, s0 fcvt s3, d4 fcvtzs w9, s2 fcvtzs w10, s0 add w9, w10, w9 fcvtzs w10, s3 add w0, w9, w10 adrp x9, fpcr //APP mrs x8, FPCR //NO_APP str x8, [x9, :lo12:fpcr] ret ``` Notice that mrs was moved below - which does not seem to preserve the floating-point exception semantics of the compiled code. PS : apologies for the double message if any ; I sent the first to llvm-dev-bounces by mistake Best regards, Cyril Six Compiler Engineer • Kalray Phone: csix at kalrayinc.com<mailto:csix at kalrayinc.com> • www.kalrayinc.com<https://www.kalrayinc.com> [Kalray logo]<https://www.kalrayinc.com/> Please consider the environment before printing this e-mail. This message contains information that may be privileged or confidential and is the property of Kalray S.A. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/065d6a6f/attachment.html>
Kevin Neal via llvm-dev
2022-Jan-06 20:32 UTC
[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers
Correct. You do need to add the required support to your backend. The X86, PowerPC, and SystemZ backends have basically complete support. The PowerPC backend has a fix to not reschedule floating-point instructions around function calls if the rounding mode may change. I haven't heard that the other two have this fix. AArch64 and RISC-V support are both a work in progress so one of the three fully-supported targets is best to examine and emulate. Also be aware that optimization of strict floating-point is a work in progress, so be prepared for not-so-great performance. Lastly, there's currently no way to have machine-specific llvm intrinsics respect "strict" mode. A fix has been proposed, but I don't think anything has been implemented. It might have been clang 12 where a warning was introduced that told you that "strict" floating-point doesn't work for that target and is therefore disabled. I don't remember exactly which release first had this. -- Kevin P. Neal SAS/C and SAS/C++ Compiler Compute Services SAS Institute, Inc. From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Cyril Six via llvm-dev Sent: Wednesday, January 05, 2022 6:44 AM To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers EXTERNAL Hello, In our Kalray LLVM backend, we have builtins to get and set system registers. One of them is $CS, which has sticky bits enforcing rounding mode or storing masked floating-point exceptions. The equivalent on AArch64 would be FPCR. In our user code, we would like to preserve the partial ordering between a SET to $CS and a floating-point operation, since the SET to $CS might be modifying the rounding mode. Similarly, we would like to preserve the partial ordering between a GET from $CS and a floating-point operation, since a user code might want to examine the floating-point exception bits right after a given floating-point operation. Another use-case we have is the following: we have a coprocessor that is turned on by setting a given bit on a system register. This can be accessed by a builtin. Such SET instruction must happen before using a coprocessor instruction - the compiler should not break that dependency when reordering instructions. We have tried to implement this by using implicit Defs and implicit Uses in our instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]` where relevant in our Target Description files. I have been running some experiments, examining the scheduling outputs and the dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and a child of VLIWPacketizerList for bundling). I have found that the implicit defs and uses are indeed taken into account by the post-RA schedulers. However, they seem to be ignored by the pre-RA schedulers. Also, they do not appear as dependencies in the SelectionDAG. If I look at what some other backends did, AArch64 does not seem to model anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary) to prevent floating-point instructions being ordered above it - but isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA schedulers do not seem to care about that. The bad consequence for us: our programmers have to encapsulate the SET instructions (touching system registers) in non-inlined functions to enforce the compiler not breaking anything. We are looking for advice on how to treat this problem - we have possible leads, like modifying the SelectionDAG to recover these dependencies, or modifying the schedulers to scan the SelectionDAG and enforce the source order when such dependency is detected (maybe by having a look at how SourceScheduler works), but we have not yet investigated it fully. Any such advice would be greatly appreciated Also, another related issue: it would seem that the flag -ffp-exception-behavior=strict does not preserve the exception semantics like it says it does. Although the generated IR seems to preserve it, there does not seem to be anything in the LLVM backends enforcing the "strict" floating-point exception behavior. That last point can be witnessed in that piece of code: https://godbolt.org/z/e96zP7jET ``` long fpcr; int toto(float a, float b, float c, double d, double e){ float bc = b + c; // first faddd asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :); float abc = a + bc; // second faddd float dw = (float) d; // fwidenlwd : should not happen before the second faddd float ew = (float) e; int dw_ewl = (int) dw + (int) ew; int abcl_dw_ewl = (int) abc + dw_ewl; return abcl_dw_ewl; } ``` Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following assembly code: ``` toto: fadd s1, s1, s2 fcvt s2, d3 fadd s0, s1, s0 fcvt s3, d4 fcvtzs w9, s2 fcvtzs w10, s0 add w9, w10, w9 fcvtzs w10, s3 add w0, w9, w10 adrp x9, fpcr //APP mrs x8, FPCR //NO_APP str x8, [x9, :lo12:fpcr] ret ``` Notice that mrs was moved below - which does not seem to preserve the floating-point exception semantics of the compiled code. PS : apologies for the double message if any ; I sent the first to llvm-dev-bounces by mistake Best regards, Cyril Six Compiler Engineer * Kalray Phone: csix at kalrayinc.com<mailto:csix at kalrayinc.com> * www.kalrayinc.com<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kalrayinc.com%2F&data=04%7C01%7Ckevin.neal%40sas.com%7Cf6449a7fc514491496e808d9d040c161%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637769798860724868%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2JlQXniNWcZMWA1G%2BX2LMsw6crfX4kCh0UGCDDAmx2w%3D&reserved=0> [Kalray logo]<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kalrayinc.com%2F&data=04%7C01%7Ckevin.neal%40sas.com%7Cf6449a7fc514491496e808d9d040c161%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637769798860724868%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2JlQXniNWcZMWA1G%2BX2LMsw6crfX4kCh0UGCDDAmx2w%3D&reserved=0> Please consider the environment before printing this e-mail. This message contains information that may be privileged or confidential and is the property of Kalray S.A. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20220106/89513199/attachment.html>