thr3ads.net - llvm dev - [llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Cyril Six via llvm-dev

2022-Jan-05 11:44 UTC

[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

Hello,

In our Kalray LLVM backend, we have builtins to get and set system registers.
One of them is $CS, which has sticky bits enforcing rounding mode or storing
masked floating-point exceptions. The equivalent on AArch64 would be FPCR.

In our user code, we would like to preserve the partial ordering between a SET
to $CS and a floating-point operation, since the SET to $CS might be modifying
the rounding mode. Similarly, we would like to preserve the partial ordering
between a GET from $CS and a floating-point operation, since a user code might
want to examine the floating-point exception bits right after a given
floating-point operation.

Another use-case we have is the following: we have a coprocessor that is turned
on by setting a given bit on a system register. This can be accessed by a
builtin. Such SET instruction must happen before using a coprocessor instruction
- the compiler should not break that dependency when reordering instructions.

We have tried to implement this by using implicit Defs and implicit Uses in our
instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]`
where relevant in our Target Description files.

I have been running some experiments, examining the scheduling outputs and the
dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and
a child of VLIWPacketizerList for bundling).

I have found that the implicit defs and uses are indeed taken into account by
the post-RA schedulers. However, they seem to be ignored by the pre-RA
schedulers. Also, they do not appear as dependencies in the SelectionDAG.

If I look at what some other backends did, AArch64 does not seem to model
anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary)
to prevent floating-point instructions being ordered above it - but
isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA
schedulers do not seem to care about that.

The bad consequence for us: our programmers have to encapsulate the SET
instructions (touching system registers) in non-inlined functions to enforce the
compiler not breaking anything.

We are looking for advice on how to treat this problem - we have possible leads,
like modifying the SelectionDAG to recover these dependencies, or modifying the
schedulers to scan the SelectionDAG and enforce the source order when such
dependency is detected (maybe by having a look at how SourceScheduler works),
but we have not yet investigated it fully.

Any such advice would be greatly appreciated

Also, another related issue: it would seem that the flag
-ffp-exception-behavior=strict does not preserve the exception semantics like it
says it does. Although the generated IR seems to preserve it, there does not
seem to be anything in the LLVM backends enforcing the "strict"
floating-point exception behavior.

That last point can be witnessed in that piece of code:
https://godbolt.org/z/e96zP7jET

```
long fpcr;

int toto(float a, float b, float c, double d, double e){
float bc = b + c; // first faddd
asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :);
float abc = a + bc; // second faddd
float dw = (float) d; // fwidenlwd : should not happen before the second faddd
float ew = (float) e;
int dw_ewl = (int) dw + (int) ew;
int abcl_dw_ewl = (int) abc + dw_ewl;
return abcl_dw_ewl;
}

```

Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following
assembly code:
```
toto:
fadd s1, s1, s2
fcvt s2, d3
fadd s0, s1, s0
fcvt s3, d4
fcvtzs w9, s2
fcvtzs w10, s0
add w9, w10, w9
fcvtzs w10, s3
add w0, w9, w10
adrp x9, fpcr
//APP
mrs x8, FPCR
//NO_APP
str x8, [x9, :lo12:fpcr]
ret
```

Notice that mrs was moved below - which does not seem to preserve the
floating-point exception semantics of the compiled code.

PS : apologies for the double message if any ; I sent the first to
llvm-dev-bounces by mistake

Best regards,

Cyril Six
Compiler Engineer • Kalray
Phone:
csix at kalrayinc.com • [ https://www.kalrayinc.com/ | www.kalrayinc.com ]

[ https://www.kalrayinc.com/ | ]

Please consider the environment before printing this e-mail.
This message contains information that may be privileged or confidential and is
the property of Kalray S.A. It is intended only for the person to whom it is
addressed. If you are not the intended recipient, you are not authorized to
print, retain, copy, disseminate, distribute, or use this message or any part
thereof. If you receive this message in error, please notify the sender
immediately and delete all copies of this message.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/6e03b758/attachment-0001.html>

Wang, Phoebe via llvm-dev

2022-Jan-05 14:44 UTC

head link

[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

Did you try `hasSideEffects = 1`?
I’m not familiar with AArch64. On X86, we have separate FPCR and FPSR. The
former is used for control (rounding, exception mask) and the latter is for
status. We modeled all FP instructions that may raise exception by
`mayRaiseFPException = 1` and using FPCR. Note, the read of FPCR instruction is
another use instead of def FPCR. So it’s not necessary to keep the order of read
instruction ahead as source order. Only the write FPCR does. I guess it is the
same reason for AArch64? Maybe you can have a check on the write of FPCR.

Thanks
Phoebe

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Cyril Six
via llvm-dev
Sent: Wednesday, January 5, 2022 7:44 PM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

Hello,

In our Kalray LLVM backend, we have builtins to get and set system registers.
One of them is $CS, which has sticky bits enforcing rounding mode or storing
masked floating-point exceptions. The equivalent on AArch64 would be FPCR.

In our user code, we would like to preserve the partial ordering between a SET
to $CS and a floating-point operation, since the SET to $CS might be modifying
the rounding mode. Similarly, we would like to preserve the partial ordering
between a GET from $CS and a floating-point operation, since a user code might
want to examine the floating-point exception bits right after a given
floating-point operation.

Another use-case we have is the following: we have a coprocessor that is turned
on by setting a given bit on a system register. This can be accessed by a
builtin. Such SET instruction must happen before using a coprocessor instruction
- the compiler should not break that dependency when reordering instructions.

We have tried to implement this by using implicit Defs and implicit Uses in our
instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]`
where relevant in our Target Description files.

I have been running some experiments, examining the scheduling outputs and the
dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and
a child of VLIWPacketizerList for bundling).

I have found that the implicit defs and uses are indeed taken into account by
the post-RA schedulers. However, they seem to be ignored by the pre-RA
schedulers. Also, they do not appear as dependencies in the SelectionDAG.

If I look at what some other backends did, AArch64 does not seem to model
anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary)
to prevent floating-point instructions being ordered above it - but
isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA
schedulers do not seem to care about that.

The bad consequence for us: our programmers have to encapsulate the SET
instructions (touching system registers) in non-inlined functions to enforce the
compiler not breaking anything.

We are looking for advice on how to treat this problem - we have possible leads,
like modifying the SelectionDAG to recover these dependencies, or modifying the
schedulers to scan the SelectionDAG and enforce the source order when such
dependency is detected (maybe by having a look at how SourceScheduler works),
but we have not yet investigated it fully.

Any such advice would be greatly appreciated

Also, another related issue: it would seem that the flag
-ffp-exception-behavior=strict does not preserve the exception semantics like it
says it does. Although the generated IR seems to preserve it, there does not
seem to be anything in the LLVM backends enforcing the "strict"
floating-point exception behavior.

That last point can be witnessed in that piece of code:
https://godbolt.org/z/e96zP7jET

```
long fpcr;

int toto(float a, float b, float c, double d, double e){
  float bc = b + c; // first faddd
  asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :);
  float abc = a + bc; // second faddd
  float dw = (float) d; // fwidenlwd : should not happen before the second faddd
  float ew = (float) e;
  int dw_ewl = (int) dw + (int) ew;
  int abcl_dw_ewl = (int) abc + dw_ewl;
  return abcl_dw_ewl;
}

```

Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following
assembly code:
```
toto:
        fadd    s1, s1, s2
        fcvt    s2, d3
        fadd    s0, s1, s0
        fcvt    s3, d4
        fcvtzs  w9, s2
        fcvtzs  w10, s0
        add     w9, w10, w9
        fcvtzs  w10, s3
        add     w0, w9, w10
        adrp    x9, fpcr
        //APP
        mrs     x8, FPCR
        //NO_APP
        str     x8, [x9, :lo12:fpcr]
        ret
```

Notice that mrs was moved below - which does not seem to preserve the
floating-point exception semantics of the compiled code.

PS : apologies for the double message if any ; I sent the first to
llvm-dev-bounces by mistake

Best regards,


Cyril Six
Compiler Engineer • Kalray
Phone:
csix at kalrayinc.com<mailto:csix at kalrayinc.com> •
www.kalrayinc.com<https://www.kalrayinc.com>

[Kalray logo]<https://www.kalrayinc.com/>

Please consider the environment before printing this e-mail.
This message contains information that may be privileged or confidential and is
the property of Kalray S.A. It is intended only for the person to whom it is
addressed. If you are not the intended recipient, you are not authorized to
print, retain, copy, disseminate, distribute, or use this message or any part
thereof. If you receive this message in error, please notify the sender
immediately and delete all copies of this message.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/065d6a6f/attachment.html>

Kevin Neal via llvm-dev

2022-Jan-06 20:32 UTC

head link

[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

Correct. You do need to add the required support to your backend.

The X86, PowerPC, and SystemZ backends have basically complete support.
The PowerPC backend has a fix to not reschedule floating-point instructions
around function calls if the rounding mode may change. I haven't heard
that the other two have this fix. AArch64 and RISC-V support are both a
work in progress so one of the three fully-supported targets is best to
examine and emulate.

Also be aware that optimization of strict floating-point is a work in
progress, so be prepared for not-so-great performance.

Lastly, there's currently no way to have machine-specific llvm intrinsics
respect "strict" mode. A fix has been proposed, but I don't think
anything
has been implemented.


It might have been clang 12 where a warning was introduced that told you
that "strict" floating-point doesn't work for that target and is
therefore
disabled. I don't remember exactly which release first had this.
--
Kevin P. Neal
SAS/C and SAS/C++ Compiler
Compute Services
SAS Institute, Inc.



From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Cyril Six
via llvm-dev
Sent: Wednesday, January 05, 2022 6:44 AM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers


EXTERNAL
Hello,

In our Kalray LLVM backend, we have builtins to get and set system registers.
One of them is $CS, which has sticky bits enforcing rounding mode or storing
masked floating-point exceptions. The equivalent on AArch64 would be FPCR.

In our user code, we would like to preserve the partial ordering between a SET
to $CS and a floating-point operation, since the SET to $CS might be modifying
the rounding mode. Similarly, we would like to preserve the partial ordering
between a GET from $CS and a floating-point operation, since a user code might
want to examine the floating-point exception bits right after a given
floating-point operation.

Another use-case we have is the following: we have a coprocessor that is turned
on by setting a given bit on a system register. This can be accessed by a
builtin. Such SET instruction must happen before using a coprocessor instruction
- the compiler should not break that dependency when reordering instructions.

We have tried to implement this by using implicit Defs and implicit Uses in our
instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]`
where relevant in our Target Description files.

I have been running some experiments, examining the scheduling outputs and the
dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and
a child of VLIWPacketizerList for bundling).

I have found that the implicit defs and uses are indeed taken into account by
the post-RA schedulers. However, they seem to be ignored by the pre-RA
schedulers. Also, they do not appear as dependencies in the SelectionDAG.

If I look at what some other backends did, AArch64 does not seem to model
anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary)
to prevent floating-point instructions being ordered above it - but
isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA
schedulers do not seem to care about that.

The bad consequence for us: our programmers have to encapsulate the SET
instructions (touching system registers) in non-inlined functions to enforce the
compiler not breaking anything.

We are looking for advice on how to treat this problem - we have possible leads,
like modifying the SelectionDAG to recover these dependencies, or modifying the
schedulers to scan the SelectionDAG and enforce the source order when such
dependency is detected (maybe by having a look at how SourceScheduler works),
but we have not yet investigated it fully.

Any such advice would be greatly appreciated

Also, another related issue: it would seem that the flag
-ffp-exception-behavior=strict does not preserve the exception semantics like it
says it does. Although the generated IR seems to preserve it, there does not
seem to be anything in the LLVM backends enforcing the "strict"
floating-point exception behavior.

That last point can be witnessed in that piece of code:
https://godbolt.org/z/e96zP7jET

```
long fpcr;

int toto(float a, float b, float c, double d, double e){
  float bc = b + c; // first faddd
  asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :);
  float abc = a + bc; // second faddd
  float dw = (float) d; // fwidenlwd : should not happen before the second faddd
  float ew = (float) e;
  int dw_ewl = (int) dw + (int) ew;
  int abcl_dw_ewl = (int) abc + dw_ewl;
  return abcl_dw_ewl;
}

```

Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following
assembly code:
```
toto:
        fadd    s1, s1, s2
        fcvt    s2, d3
        fadd    s0, s1, s0
        fcvt    s3, d4
        fcvtzs  w9, s2
        fcvtzs  w10, s0
        add     w9, w10, w9
        fcvtzs  w10, s3
        add     w0, w9, w10
        adrp    x9, fpcr
        //APP
        mrs     x8, FPCR
        //NO_APP
        str     x8, [x9, :lo12:fpcr]
        ret
```

Notice that mrs was moved below - which does not seem to preserve the
floating-point exception semantics of the compiled code.

PS : apologies for the double message if any ; I sent the first to
llvm-dev-bounces by mistake

Best regards,


Cyril Six
Compiler Engineer * Kalray
Phone:
csix at kalrayinc.com<mailto:csix at kalrayinc.com> *
www.kalrayinc.com<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kalrayinc.com%2F&data=04%7C01%7Ckevin.neal%40sas.com%7Cf6449a7fc514491496e808d9d040c161%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637769798860724868%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2JlQXniNWcZMWA1G%2BX2LMsw6crfX4kCh0UGCDDAmx2w%3D&reserved=0>

[Kalray
logo]<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kalrayinc.com%2F&data=04%7C01%7Ckevin.neal%40sas.com%7Cf6449a7fc514491496e808d9d040c161%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637769798860724868%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2JlQXniNWcZMWA1G%2BX2LMsw6crfX4kCh0UGCDDAmx2w%3D&reserved=0>

Please consider the environment before printing this e-mail.
This message contains information that may be privileged or confidential and is
the property of Kalray S.A. It is intended only for the person to whom it is
addressed. If you are not the intended recipient, you are not authorized to
print, retain, copy, disseminate, distribute, or use this message or any part
thereof. If you receive this message in error, please notify the sender
immediately and delete all copies of this message.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220106/89513199/attachment.html>

llvm dev - Jan 2022 - Implicit Defs and Uses are ignored by pre-RA schedulers

[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers