thr3ads.net - llvm dev - [llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not [Oct 2019]

If this information is useful, please help other people find it:
Share via:

David Tellenbach via llvm-dev

2019-Oct-05 14:03 UTC

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi Vedant,

thanks for your answer and sorry for the late response.
It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).
Yes, D68075 is a somewhat conservative patch that aligns the behaviour on
AArch64 (for GNU targets) that leads to consistent generated assembly. As you
said it does not help if unwind tables are explicitly disabled
(`-fno-unwind-tables`). It is a conservative patch since it decreases scheduling
potential (due to smaller scheduling regions) for the non-debug case but fixes
the bug of generating inconsistent assembly.
Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).
CFI instructions are inserted during target specific frame lowering, the
CFIInstrInserter is only run on X86 targets and seems to verify the correctness
of CFI instructions after the got inserted during X86 frame lowering.

Regarding the 3 possible roadmaps to solve the issue (see my first email) I
currently think 3 (changing instruction scheduling such that CFI instructions
are scheduled together with stack altering instructions) is the most promising
one because it wouldn't require targets specific changes. Since e.g. X86 or
if I remember correctly also AArch64 on Darwin targets insert CFI instructions
in both, debug and non-debug mode, solution 3 would increase scheduling
potential for these targets.

To summarize: D68075 would align non-debug mode with debug mode and therefore
potentially *decrease* scheduling potential. Solution 3 would align debug mode
with non-debug mode (in terms of instruction scheduling) and therefore
*increase* scheduling potential.

    David
________________________________
From: vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar
<vedant_kumar at apple.com>
Sent: 30 September 2019 20:50
To: David Tellenbach <David.Tellenbach at arm.com>
Cc: paul.robinson at sony.com <paul.robinson at sony.com>; llvm-dev at
lists.llvm.org <llvm-dev at lists.llvm.org>; nd <nd at arm.com>; Tim
Northover <tnorthover at apple.com>; Ahmed Bougacha <abougacha at
apple.com>
Subject: Re: [llvm-dev] [AArch64] Generated assembly differs depending on
whether debug information is generated or not

Hi David,

Thanks for looking into this.

It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).

Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).

best,
vedant

On Sep 26, 2019, at 6:57 AM, David Tellenbach via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi Paul,

thanks for your comments.
This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.
Yes, in principle the problem that instruction scheduling is dependent on the
presence of cfi instruction should affect more targets than AArch64. However,
this does not imply that all of these targets produce inconsistent assembly
depending on debug information.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
At least 2. would require a lot of target dependent changes because the
insertion of cfi instructions would have to be moved from target specific frame
lowering into an (probably again target specific) insertion pass.

        David

On 26/09/2019 13:55, paul.robinson at sony.com<mailto:paul.robinson at
sony.com> wrote:
Hi David,

This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Tellenbach via llvm-dev
Sent: Thursday, September 26, 2019 5:57 AM
To: llvm-dev
Cc: nd
Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether
debug information is generated or not

Hi,

we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.

The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:

    a() {
      b(a);
      for (;;)
        c("", b);
    }

The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.

We see several ways to fix the issue and would welcome comments on this:

  1. Enabling unwind tables by default for AArch64: By enabling unwind tables
     by default CFI instructions will be inserted in both, debug and non-debug
     mode. This should lead to smaller scheduling regions and probably to less
     scheduling potential.

     However, I've measured the average size of scheduling regions for
randomly
     generated programs with and without default unwind tables and found an
     average difference of 0.5 to 1 instruction. Other architectures such as x86
     do exactly this and therefore don't face the issue.

     The following patch on Phabricator introduces the said change:
                    https://reviews.llvm.org/D68076

  2. Postpone insertion of CFI instructions until after instruction scheduling.
     This would require a new pass running after instruction scheduling that
     inserts CFI instructions if needed. The downside I see is increased
     compile-time and probably some code duplication with frame lowering.

  3. Change instruction scheduling such that CFI instructions get tied together
     with relevant instructions in such a way that they get scheduled together.
     If this could work it would probably the cleanest solution.

To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.

Comments and additional ideas are welcome.

    David


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191005/ddd8658d/attachment.html>

Vedant Kumar via llvm-dev

2019-Oct-07 17:31 UTC

head link

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

> On Oct 5, 2019, at 7:03 AM, David Tellenbach <David.Tellenbach at
arm.com> wrote:
> 
> Hi Vedant,
> 
> thanks for your answer and sorry for the late response. 
> It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).
> Yes, D68075 is a somewhat conservative patch that aligns the behaviour on
AArch64 (for GNU targets) that leads to consistent generated assembly. As you
said it does not help if unwind tables are explicitly disabled
(`-fno-unwind-tables`). It is a conservative patch since it decreases scheduling
potential (due to smaller scheduling regions) for the non-debug case but fixes
the bug of generating inconsistent assembly.
> Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).
> CFI instructions are inserted during target specific frame lowering, the
CFIInstrInserter is only run on X86 targets and seems to verify the correctness
of CFI instructions after the got inserted during X86 frame lowering.
> 
> Regarding the 3 possible roadmaps to solve the issue (see my first email) I
currently think 3 (changing instruction scheduling such that CFI instructions
are scheduled together with stack altering instructions) is the most promising
one because it wouldn't require targets specific changes. Since e.g. X86 or
if I remember correctly also AArch64 on Darwin targets insert CFI instructions
in both, debug and non-debug mode, solution 3 would increase scheduling
potential for these targets.
> 
> To summarize: D68075 would align non-debug mode with debug mode and
therefore potentially *decrease* scheduling potential. Solution 3 would align
debug mode with non-debug mode (in terms of instruction scheduling) and
therefore *increase* scheduling potential.
Thanks for breaking things down so clearly. My gut instinct would be to push for
changes that make scheduling decisions the same modulo CFI instructions, but I
really don't know how much work that entails, or if it would pay for its own
complexity. OTOH the "option 1" patch you have is an immediate fix.

CC'ing some folks who probably have more experience working with
CFI/scheduling than me (+ Amara, Adam, Florian).

vedant
>  
> 
>     David
> From: vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar
<vedant_kumar at apple.com>
> Sent: 30 September 2019 20:50
> To: David Tellenbach <David.Tellenbach at arm.com>
> Cc: paul.robinson at sony.com <paul.robinson at sony.com>; llvm-dev
at lists.llvm.org <llvm-dev at lists.llvm.org>; nd <nd at arm.com>;
Tim Northover <tnorthover at apple.com>; Ahmed Bougacha <abougacha at
apple.com>
> Subject: Re: [llvm-dev] [AArch64] Generated assembly differs depending on
whether debug information is generated or not
>  
> Hi David,
> 
> Thanks for looking into this.
> 
> It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).
> 
> Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).
> 
> best,
> vedant
> 
>> On Sep 26, 2019, at 6:57 AM, David Tellenbach via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hi Paul,
>> 
>> thanks for your comments.
>>> This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240
<https://bugs.llvm.org/show_bug.cgi?id=37240>).  I suspect this problem
affects all targets; your patch D68076 would address it only for AArch64. 
Although I would suggest you do some careful measurements to determine the
runtime performance effect, to decide whether this is acceptable.
>> Yes, in principle the problem that instruction scheduling is dependent
on the presence of cfi instruction should affect more targets than AArch64.
However, this does not imply that all of these targets produce inconsistent
assembly depending on debug information.
>> 
>>> The more complete approach in your steps 2 + 3 would solve this for
all targets, assuming the solution did not have to be very target-specific. 
This would benefit the entire community.
>> At least 2. would require a lot of target dependent changes because the
insertion of cfi instructions would have to be moved from target specific frame
lowering into an (probably again target specific) insertion pass.
>> 
>>         David
>> 
>> On 26/09/2019 13:55, paul.robinson at sony.com <mailto:paul.robinson
at sony.com> wrote:
>>> Hi David,
>>>  
>>> This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240
<https://bugs.llvm.org/show_bug.cgi?id=37240>).  I suspect this problem
affects all targets; your patch D68076 would address it only for AArch64. 
Although I would suggest you do some careful measurements to determine the
runtime performance effect, to decide whether this is acceptable.
>>>  
>>> The more complete approach in your steps 2 + 3 would solve this for
all targets, assuming the solution did not have to be very target-specific. 
This would benefit the entire community.
>>> --paulr
>>>   <>
>>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of David Tellenbach
via llvm-dev
>>> Sent: Thursday, September 26, 2019 5:57 AM
>>> To: llvm-dev
>>> Cc: nd
>>> Subject: [llvm-dev] [AArch64] Generated assembly differs depending
on whether debug information is generated or not
>>>  
>>> Hi,
>>>  
>>> we at Arm have noticed that assembly can differ when compiling for
AArch64
>>> depending on whether debug information is generated or not.
>>>  
>>> The issue is reproducible for the following small example compiled
with `-O1`
>>> for `aarch64-arm-linux-gnu`:
>>>  
>>>     a() {
>>>       b(a);
>>>       for (;;)
>>>         c("", b);
>>>     }
>>>  
>>> The reason for the difference is that AArch64 frame lowering emits
CFI
>>> instructions if debug information is enabled but not if not. CFI
instructions
>>> act as scheduling boundaries during instruction scheduling and
therefore lead to
>>> differing scheduling regions and an overall different instruction
scheduling.
>>>  
>>> We see several ways to fix the issue and would welcome comments on
this:
>>>  
>>>   1. Enabling unwind tables by default for AArch64: By enabling
unwind tables
>>>      by default CFI instructions will be inserted in both, debug
and non-debug
>>>      mode. This should lead to smaller scheduling regions and
probably to less
>>>      scheduling potential.
>>>  
>>>      However, I've measured the average size of scheduling
regions for randomly
>>>      generated programs with and without default unwind tables and
found an
>>>      average difference of 0.5 to 1 instruction. Other
architectures such as x86
>>>      do exactly this and therefore don't face the issue.
>>>  
>>>      The following patch on Phabricator introduces the said change:
>>>                     https://reviews.llvm.org/D68076
<https://reviews.llvm.org/D68076>
>>>  
>>>   2. Postpone insertion of CFI instructions until after instruction
scheduling.
>>>      This would require a new pass running after instruction
scheduling that
>>>      inserts CFI instructions if needed. The downside I see is
increased
>>>      compile-time and probably some code duplication with frame
lowering.
>>>  
>>>   3. Change instruction scheduling such that CFI instructions get
tied together
>>>      with relevant instructions in such a way that they get
scheduled together.
>>>      If this could work it would probably the cleanest solution.
>>>  
>>> To summarize:
>>> 1. would make scheduling in the non-debug case behave like in the
>>> debug case and therefore probably cost some scheduling potential.
However, it
>>> would be by far the most easy to implement. 2. + 3. would probably
lead to
>>> better scheduling but seem to be more complex to implement.
>>>  
>>> Comments and additional ideas are welcome.
>>>  
>>>     David
>>>  
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191007/896c5171/attachment-0001.html>

David Tellenbach via llvm-dev

2019-Oct-08 13:56 UTC

head link

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi,

thanks for adding some more people to this thread.

I've just finished a first version of a patch that implements scheduling of
CFI
instructions, currently controllable via a flag:

        https://reviews.llvm.org/D68639

Enabling scheduling of CFI instructions by default will currently break some
existing tests. I would like to get this patch accepted with scheduling of CFI
instructions *disabled* by default.

Tests that would currently fail can then be fixed in a follow-up patch and we
could eventually enable CFI instruction scheduling by default (if this gets
community consent).

I've added Paul, Tim and Vedant as reviewers, if someone else is interested
I would be really happy about some feedback.

    David

On 07/10/2019 18:31, Vedant Kumar wrote:


On Oct 5, 2019, at 7:03 AM, David Tellenbach <David.Tellenbach at
arm.com<mailto:David.Tellenbach at arm.com>> wrote:

Hi Vedant,

thanks for your answer and sorry for the late response.
It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).
Yes, D68075 is a somewhat conservative patch that aligns the behaviour on
AArch64 (for GNU targets) that leads to consistent generated assembly. As you
said it does not help if unwind tables are explicitly disabled
(`-fno-unwind-tables`). It is a conservative patch since it decreases scheduling
potential (due to smaller scheduling regions) for the non-debug case but fixes
the bug of generating inconsistent assembly.
Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).
CFI instructions are inserted during target specific frame lowering, the
CFIInstrInserter is only run on X86 targets and seems to verify the correctness
of CFI instructions after the got inserted during X86 frame lowering.

Regarding the 3 possible roadmaps to solve the issue (see my first email) I
currently think 3 (changing instruction scheduling such that CFI instructions
are scheduled together with stack altering instructions) is the most promising
one because it wouldn't require targets specific changes. Since e.g. X86 or
if I remember correctly also AArch64 on Darwin targets insert CFI instructions
in both, debug and non-debug mode, solution 3 would increase scheduling
potential for these targets.

To summarize: D68075 would align non-debug mode with debug mode and therefore
potentially *decrease* scheduling potential. Solution 3 would align debug mode
with non-debug mode (in terms of instruction scheduling) and therefore
*increase* scheduling potential.

Thanks for breaking things down so clearly. My gut instinct would be to push for
changes that make scheduling decisions the same modulo CFI instructions, but I
really don't know how much work that entails, or if it would pay for its own
complexity. OTOH the "option 1" patch you have is an immediate fix.

CC'ing some folks who probably have more experience working with
CFI/scheduling than me (+ Amara, Adam, Florian).

vedant



    David
________________________________
From: vsk at apple.com<mailto:vsk at apple.com> <vsk at
apple.com<mailto:vsk at apple.com>> on behalf of Vedant Kumar
<vedant_kumar at apple.com<mailto:vedant_kumar at apple.com>>
Sent: 30 September 2019 20:50
To: David Tellenbach <David.Tellenbach at arm.com<mailto:David.Tellenbach
at arm.com>>
Cc: paul.robinson at sony.com<mailto:paul.robinson at sony.com>
<paul.robinson at sony.com<mailto:paul.robinson at sony.com>>;
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; nd <nd at
arm.com<mailto:nd at arm.com>>; Tim Northover <tnorthover at
apple.com<mailto:tnorthover at apple.com>>; Ahmed Bougacha
<abougacha at apple.com<mailto:abougacha at apple.com>>
Subject: Re: [llvm-dev] [AArch64] Generated assembly differs depending on
whether debug information is generated or not

Hi David,

Thanks for looking into this.

It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).

Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).

best,
vedant

On Sep 26, 2019, at 6:57 AM, David Tellenbach via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi Paul,

thanks for your comments.
This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.
Yes, in principle the problem that instruction scheduling is dependent on the
presence of cfi instruction should affect more targets than AArch64. However,
this does not imply that all of these targets produce inconsistent assembly
depending on debug information.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
At least 2. would require a lot of target dependent changes because the
insertion of cfi instructions would have to be moved from target specific frame
lowering into an (probably again target specific) insertion pass.

        David

On 26/09/2019 13:55, paul.robinson at sony.com<mailto:paul.robinson at
sony.com> wrote:
Hi David,

This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Tellenbach via llvm-dev
Sent: Thursday, September 26, 2019 5:57 AM
To: llvm-dev
Cc: nd
Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether
debug information is generated or not

Hi,

we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.

The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:

    a() {
      b(a);
      for (;;)
        c("", b);
    }

The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.

We see several ways to fix the issue and would welcome comments on this:

  1. Enabling unwind tables by default for AArch64: By enabling unwind tables
     by default CFI instructions will be inserted in both, debug and non-debug
     mode. This should lead to smaller scheduling regions and probably to less
     scheduling potential.

     However, I've measured the average size of scheduling regions for
randomly
     generated programs with and without default unwind tables and found an
     average difference of 0.5 to 1 instruction. Other architectures such as x86
     do exactly this and therefore don't face the issue.

     The following patch on Phabricator introduces the said change:
                    https://reviews.llvm.org/D68076

  2. Postpone insertion of CFI instructions until after instruction scheduling.
     This would require a new pass running after instruction scheduling that
     inserts CFI instructions if needed. The downside I see is increased
     compile-time and probably some code duplication with frame lowering.

  3. Change instruction scheduling such that CFI instructions get tied together
     with relevant instructions in such a way that they get scheduled together.
     If this could work it would probably the cleanest solution.

To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.

Comments and additional ideas are welcome.

    David


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/ce3fba60/attachment-0001.html>

David Tellenbach via llvm-dev

2019-Oct-10 10:31 UTC

head link

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Hi all,

thinking a bit harder about the problem I now believe the problem is not fixable
by changing scheduling and my patch https://reviews.llvm.org/D68639 gets
obsolete.

Since a cfi instruction describes the stack with respect to all stack altering
instructions that precede it, no such stack altering instruction is allowed to
be scheduled below the cfi instruction. But the opposite is also true: No stack
altering instruction is allowed to be move above a cfi instruction.

Cfi instructions have to act as barriers for stack altering instructions. The
current situation let's them act as barriers for all instructions, my patch
relaxed this behaviour too much.

By letting cfi instructions act as barriers for stack altering instructions
only, scheduling could be improved. However, the generated assembly would still
be different to the one without cfi instructions (e.g. non-debug case).

Next plan: Postpone insertion of cfi instructions after machine scheduling. This
is option 3 in the original email and will probably require target specific
code. I'll keep you updated.

    David


On 07/10/2019 18:31, Vedant Kumar wrote:


On Oct 5, 2019, at 7:03 AM, David Tellenbach <David.Tellenbach at
arm.com<mailto:David.Tellenbach at arm.com>> wrote:

Hi Vedant,

thanks for your answer and sorry for the late response.
It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).
Yes, D68075 is a somewhat conservative patch that aligns the behaviour on
AArch64 (for GNU targets) that leads to consistent generated assembly. As you
said it does not help if unwind tables are explicitly disabled
(`-fno-unwind-tables`). It is a conservative patch since it decreases scheduling
potential (due to smaller scheduling regions) for the non-debug case but fixes
the bug of generating inconsistent assembly.
Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).
CFI instructions are inserted during target specific frame lowering, the
CFIInstrInserter is only run on X86 targets and seems to verify the correctness
of CFI instructions after the got inserted during X86 frame lowering.

Regarding the 3 possible roadmaps to solve the issue (see my first email) I
currently think 3 (changing instruction scheduling such that CFI instructions
are scheduled together with stack altering instructions) is the most promising
one because it wouldn't require targets specific changes. Since e.g. X86 or
if I remember correctly also AArch64 on Darwin targets insert CFI instructions
in both, debug and non-debug mode, solution 3 would increase scheduling
potential for these targets.

To summarize: D68075 would align non-debug mode with debug mode and therefore
potentially *decrease* scheduling potential. Solution 3 would align debug mode
with non-debug mode (in terms of instruction scheduling) and therefore
*increase* scheduling potential.

Thanks for breaking things down so clearly. My gut instinct would be to push for
changes that make scheduling decisions the same modulo CFI instructions, but I
really don't know how much work that entails, or if it would pay for its own
complexity. OTOH the "option 1" patch you have is an immediate fix.

CC'ing some folks who probably have more experience working with
CFI/scheduling than me (+ Amara, Adam, Florian).

vedant



    David
________________________________
From: vsk at apple.com<mailto:vsk at apple.com> <vsk at
apple.com<mailto:vsk at apple.com>> on behalf of Vedant Kumar
<vedant_kumar at apple.com<mailto:vedant_kumar at apple.com>>
Sent: 30 September 2019 20:50
To: David Tellenbach <David.Tellenbach at arm.com<mailto:David.Tellenbach
at arm.com>>
Cc: paul.robinson at sony.com<mailto:paul.robinson at sony.com>
<paul.robinson at sony.com<mailto:paul.robinson at sony.com>>;
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; nd <nd at
arm.com<mailto:nd at arm.com>>; Tim Northover <tnorthover at
apple.com<mailto:tnorthover at apple.com>>; Ahmed Bougacha
<abougacha at apple.com<mailto:abougacha at apple.com>>
Subject: Re: [llvm-dev] [AArch64] Generated assembly differs depending on
whether debug information is generated or not

Hi David,

Thanks for looking into this.

It seems like D68076 might not address the underlying issue here (e.g. it
probably doesn't improve the situation for projects using `-g
-fno-unwind-tables`?).

Would you mind elaborating a bit on your proposals to delay/change CFI
instruction insertion? In particular, it'd help to hear a bit about how CFI
instructions are inserted today (is some of it done by CFIInstrInserter, and the
rest by target-specific frame lowering code?).

best,
vedant

On Sep 26, 2019, at 6:57 AM, David Tellenbach via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi Paul,

thanks for your comments.
This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.
Yes, in principle the problem that instruction scheduling is dependent on the
presence of cfi instruction should affect more targets than AArch64. However,
this does not imply that all of these targets produce inconsistent assembly
depending on debug information.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
At least 2. would require a lot of target dependent changes because the
insertion of cfi instructions would have to be moved from target specific frame
lowering into an (probably again target specific) insertion pass.

        David

On 26/09/2019 13:55, paul.robinson at sony.com<mailto:paul.robinson at
sony.com> wrote:
Hi David,

This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240).  I suspect this
problem affects all targets; your patch D68076 would address it only for
AArch64.  Although I would suggest you do some careful measurements to determine
the runtime performance effect, to decide whether this is acceptable.

The more complete approach in your steps 2 + 3 would solve this for all targets,
assuming the solution did not have to be very target-specific.  This would
benefit the entire community.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Tellenbach via llvm-dev
Sent: Thursday, September 26, 2019 5:57 AM
To: llvm-dev
Cc: nd
Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether
debug information is generated or not

Hi,

we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.

The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:

    a() {
      b(a);
      for (;;)
        c("", b);
    }

The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.

We see several ways to fix the issue and would welcome comments on this:

  1. Enabling unwind tables by default for AArch64: By enabling unwind tables
     by default CFI instructions will be inserted in both, debug and non-debug
     mode. This should lead to smaller scheduling regions and probably to less
     scheduling potential.

     However, I've measured the average size of scheduling regions for
randomly
     generated programs with and without default unwind tables and found an
     average difference of 0.5 to 1 instruction. Other architectures such as x86
     do exactly this and therefore don't face the issue.

     The following patch on Phabricator introduces the said change:
                    https://reviews.llvm.org/D68076

  2. Postpone insertion of CFI instructions until after instruction scheduling.
     This would require a new pass running after instruction scheduling that
     inserts CFI instructions if needed. The downside I see is increased
     compile-time and probably some code duplication with frame lowering.

  3. Change instruction scheduling such that CFI instructions get tied together
     with relevant instructions in such a way that they get scheduled together.
     If this could work it would probably the cleanest solution.

To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.

Comments and additional ideas are welcome.

    David


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191010/6642b5d8/attachment-0001.html>

llvm dev - Oct 2019 - [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not