thr3ads.net - llvm dev - [llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not [Sep 2019]

If this information is useful, please help other people find it:
Share via:

David Tellenbach via llvm-dev

2019-Sep-26 09:57 UTC

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi,

we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.

The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:

    a() {
      b(a);
      for (;;)
        c("", b);
    }

The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.

We see several ways to fix the issue and would welcome comments on this:

  1. Enabling unwind tables by default for AArch64: By enabling unwind tables
     by default CFI instructions will be inserted in both, debug and non-debug
     mode. This should lead to smaller scheduling regions and probably to less
     scheduling potential.

     However, I've measured the average size of scheduling regions for
randomly
     generated programs with and without default unwind tables and found an
     average difference of 0.5 to 1 instruction. Other architectures such as x86
     do exactly this and therefore don't face the issue.

     The following patch on Phabricator introduces the said change:
                    https://reviews.llvm.org/D68076

  2. Postpone insertion of CFI instructions until after instruction scheduling.
     This would require a new pass running after instruction scheduling that
     inserts CFI instructions if needed. The downside I see is increased
     compile-time and probably some code duplication with frame lowering.

  3. Change instruction scheduling such that CFI instructions get tied together
     with relevant instructions in such a way that they get scheduled together.
     If this could work it would probably the cleanest solution.

To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.

Comments and additional ideas are welcome.

    David

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/f6200b54/attachment.html>

via llvm-dev

2019-Sep-26 12:55 UTC

head link

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi David,

This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Tellenbach via llvm-dev
Sent: Thursday, September 26, 2019 5:57 AM
To: llvm-dev
Cc: nd
Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether
debug information is generated or not

Hi,

we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.

The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:

    a() {
      b(a);
      for (;;)
        c("", b);
    }

The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.

We see several ways to fix the issue and would welcome comments on this:

  1. Enabling unwind tables by default for AArch64: By enabling unwind tables
     by default CFI instructions will be inserted in both, debug and non-debug
     mode. This should lead to smaller scheduling regions and probably to less
     scheduling potential.

     However, I've measured the average size of scheduling regions for
randomly
     generated programs with and without default unwind tables and found an
     average difference of 0.5 to 1 instruction. Other architectures such as x86
     do exactly this and therefore don't face the issue.

     The following patch on Phabricator introduces the said change:
                    https://reviews.llvm.org/D68076

  2. Postpone insertion of CFI instructions until after instruction scheduling.
     This would require a new pass running after instruction scheduling that
     inserts CFI instructions if needed. The downside I see is increased
     compile-time and probably some code duplication with frame lowering.

  3. Change instruction scheduling such that CFI instructions get tied together
     with relevant instructions in such a way that they get scheduled together.
     If this could work it would probably the cleanest solution.

To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.

Comments and additional ideas are welcome.

    David

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/69d6dc66/attachment.html>

David Tellenbach via llvm-dev

2019-Sep-26 13:57 UTC

head link

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi Paul,

thanks for your comments.
This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.
Yes, in principle the problem that instruction scheduling is dependent on the
presence of cfi instruction should affect more targets than AArch64. However,
this does not imply that all of these targets produce inconsistent assembly
depending on debug information.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
At least 2. would require a lot of target dependent changes because the
insertion of cfi instructions would have to be moved from target specific frame
lowering into an (probably again target specific) insertion pass.

        David

On 26/09/2019 13:55, paul.robinson at sony.com<mailto:paul.robinson at
sony.com> wrote:
Hi David,

This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Tellenbach via llvm-dev
Sent: Thursday, September 26, 2019 5:57 AM
To: llvm-dev
Cc: nd
Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether
debug information is generated or not

Hi,

we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.

The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:

    a() {
      b(a);
      for (;;)
        c("", b);
    }

The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.

We see several ways to fix the issue and would welcome comments on this:

  1. Enabling unwind tables by default for AArch64: By enabling unwind tables
     by default CFI instructions will be inserted in both, debug and non-debug
     mode. This should lead to smaller scheduling regions and probably to less
     scheduling potential.

     However, I've measured the average size of scheduling regions for
randomly
     generated programs with and without default unwind tables and found an
     average difference of 0.5 to 1 instruction. Other architectures such as x86
     do exactly this and therefore don't face the issue.

     The following patch on Phabricator introduces the said change:
                    https://reviews.llvm.org/D68076

  2. Postpone insertion of CFI instructions until after instruction scheduling.
     This would require a new pass running after instruction scheduling that
     inserts CFI instructions if needed. The downside I see is increased
     compile-time and probably some code duplication with frame lowering.

  3. Change instruction scheduling such that CFI instructions get tied together
     with relevant instructions in such a way that they get scheduled together.
     If this could work it would probably the cleanest solution.

To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.

Comments and additional ideas are welcome.

    David


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/03767c1b/attachment-0001.html>

Danila Malyutin via llvm-dev

2019-Oct-08 13:43 UTC

head link

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi David,
I indeed forgot to cc the list.
The last time I've checked the scheduling/tracking of debug values was done
in a best-effort way by simply "remembering" all consecutive dbg
instructions that followed some other instruction in
ScheduleDAGInstrs::buildSchedGraph. This works for most cases but sometimes it
can produce wrong debug info by rescheduling unrelated instructions (or not
scheduling the related ones) since, IRC, it's perfectly valid to have
something like

R1 = ...
DEBUG_VALUE R1, ...
<some instruction that doesn't touch R1>
DEBUG_VALUE R1, <same as above>

And for this example if the first instruction is moved , the first dbg value
would be moved as well (as it should) while second one will stay after the
second instruction (which would produce wrong dbg info at that point).
If there was a way to properly associate each instruction with all affected
dbg_values whether they are, it could solve this problem, although there might
be other approaches as well.

--
Danila

From: David Tellenbach [mailto:David.Tellenbach at arm.com]
Sent: Saturday, October 5, 2019 17:14
To: Danila Malyutin <Danila.Malyutin at synopsys.com>; paul.robinson at
sony.com
Cc: nd <nd at arm.com>
Subject: Re: [AArch64] Generated assembly differs depending on whether debug
information is generated or not

Hi Danila,

sorry for not responding to this. Was this message meant to go to the mailing
list? If so, you probably forgot to CC llvm-dev. Feel free to forward my answer
to the list.
Step #3 would also likely allow to solve some corner cases where incorrect debug
info is generated due to some DEBUG_VALUE or similar becoming stale after
corresponding instruction has been rescheduled.
If I see it correctly exactly this is currently already done for debug values
but not for CFI instructions. My current implementation of 3 works very similar
to the current implementation for debug values.

    David
________________________________
From: Danila Malyutin <Danila.Malyutin at
synopsys.com<mailto:Danila.Malyutin at synopsys.com>>
Sent: 27 September 2019 15:33
To: paul.robinson at sony.com<mailto:paul.robinson at sony.com>
<paul.robinson at sony.com<mailto:paul.robinson at sony.com>>; David
Tellenbach <David.Tellenbach at arm.com<mailto:David.Tellenbach at
arm.com>>
Cc: nd <nd at arm.com<mailto:nd at arm.com>>
Subject: RE: [AArch64] Generated assembly differs depending on whether debug
information is generated or not


Hi,



Step #3 would also likely allow to solve some corner cases where incorrect debug
info is generated due to some DEBUG_VALUE or similar becoming stale after
corresponding instruction has been rescheduled.



--

Danila



From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of via
llvm-dev
Sent: Thursday, September 26, 2019 15:56
To: David.Tellenbach at arm.com<mailto:David.Tellenbach at arm.com>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; nd at
arm.com<mailto:nd at arm.com>
Subject: Re: [llvm-dev] [AArch64] Generated assembly differs depending on
whether debug information is generated or not



Hi David,



This is PR37240
(https://bugs.llvm.org/show_bug.cgi?id=37240<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.llvm.org_show-5Fbug.cgi-3Fid-3D37240&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=YgdxWMcdqQPlU9EdetI-xI79G7ouw9_Us0dFsZnFQYU&m=jSHL9oDZDh37nqFyak5E6UBF3ka2439pQmuh1_jwC8Q&s=3NjanlPt0pnnqMbE_6Tms5Wt1-jeeWCz1auOWNEgK_k&e=>).
I suspect this problem affects all targets; your patch D68076 would address it
only for AArch64.  Although I would suggest you do some careful measurements to
determine the runtime performance effect, to decide whether this is acceptable.



The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.

--paulr



From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Tellenbach via llvm-dev
Sent: Thursday, September 26, 2019 5:57 AM
To: llvm-dev
Cc: nd
Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether
debug information is generated or not



Hi,



we at Arm have noticed that assembly can differ when compiling for AArch64

depending on whether debug information is generated or not.



The issue is reproducible for the following small example compiled with `-O1`

for `aarch64-arm-linux-gnu`:



    a() {

      b(a);

      for (;;)

        c("", b);

    }



The reason for the difference is that AArch64 frame lowering emits CFI

instructions if debug information is enabled but not if not. CFI instructions

act as scheduling boundaries during instruction scheduling and therefore lead to

differing scheduling regions and an overall different instruction scheduling.



We see several ways to fix the issue and would welcome comments on this:



  1. Enabling unwind tables by default for AArch64: By enabling unwind tables

     by default CFI instructions will be inserted in both, debug and non-debug

     mode. This should lead to smaller scheduling regions and probably to less

     scheduling potential.



     However, I've measured the average size of scheduling regions for
randomly

     generated programs with and without default unwind tables and found an

     average difference of 0.5 to 1 instruction. Other architectures such as x86

     do exactly this and therefore don't face the issue.



     The following patch on Phabricator introduces the said change:

                   
https://reviews.llvm.org/D68076<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D68076&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=VEV8gWVf26SDOqiMtTxnBloZmItAauQlSqznsCc0KxY&m=alP6KoFztCbhtXL-SYaJVPAdcvyFLHC3ov_-8tMHupo&s=OYz6x7uqIq1oT9YjdwP9RHcyvZJHOw_EF7acFMnur6E&e=>



  2. Postpone insertion of CFI instructions until after instruction scheduling.

     This would require a new pass running after instruction scheduling that

     inserts CFI instructions if needed. The downside I see is increased

     compile-time and probably some code duplication with frame lowering.



  3. Change instruction scheduling such that CFI instructions get tied together

     with relevant instructions in such a way that they get scheduled together.

     If this could work it would probably the cleanest solution.



To summarize:

1. would make scheduling in the non-debug case behave like in the

debug case and therefore probably cost some scheduling potential. However, it

would be by far the most easy to implement. 2. + 3. would probably lead to

better scheduling but seem to be more complex to implement.



Comments and additional ideas are welcome.



    David


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/3ba5ea2e/attachment.html>

llvm dev - Sep 2019 - [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not