thr3ads.net - llvm dev - [llvm-dev] [RFC] Asynchronous unwind tables attribute [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Momchil Velikov via llvm-dev

2021-Nov-17 11:18 UTC

[llvm-dev] [RFC] Asynchronous unwind tables attribute

On one hand, we have the `uwtable` attribute in LLVM IR, which tells
whether to emit CFI directives. On the other hand, we have the `clang
-cc1` command-line option `-funwind-tables=1|2 ` and the codegen
option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or
asynchronous unwind tables (2)`.
Thus we lose along the way the information whether we want just some
unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

We could, for example, extend the `uwtable` attribute with an optional
value, e.g.
  -  `uwtable` (default to 2)
  -  `uwtable(1)`, sync unwind tables
  -  `uwtable(2)`, async unwind tables
  -  `uwtable(3)`, async unwind tables, but tracking only a subset of
registers (e.g. CFA and return address)

Or add a new attribute `async_uwtable`.

Other suggestions? Comments?

~chill

--
Compiler scrub, Arm

Fāng-ruì Sòng via llvm-dev

2021-Nov-20 08:26 UTC

head link

[llvm-dev] [RFC] Asynchronous unwind tables attribute

On Wed, Nov 17, 2021 at 3:19 AM Momchil Velikov via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On one hand, we have the `uwtable` attribute in LLVM IR, which tells
> whether to emit CFI directives. On the other hand, we have the `clang
> -cc1` command-line option `-funwind-tables=1|2 ` and the codegen
> option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or
> asynchronous unwind tables (2)`.
> Thus we lose along the way the information whether we want just some
> unwind tables or asynchronous unwind tables.
Thanks for starting the topic. I am very interested in the topic and
would like to see that CFI gets improved.

I have looked into -funwind-tables/-fasynchronous-unwind-tables and
done some relatively simple changes
like (default to -fasynchronous-unwind-tables for aarch64/ppc,
fix -f(no-)unwind-tables/-f(no-)asynchronous-unwind-tables/make
-fno-asynchronous-unwind-tables work with instrumentation,
add `-funwind-tables=1|2 `) but haven't done anything on the IR level.
It's good to see that someone picks up the heavylift work so that I
don't need to do it:)
That said, if you need a reviewer or help on some work items, feel
free to offload some to me.
> Asynchronous unwind tables take more space in the runtime image, I'd
> estimate something like 80-90% more, as the difference is adding
> roughly the same number of CFI directives as for prologues, only a bit
> simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
> more, if you consider tail duplication of epilogue blocks.
> Asynchronous unwind tables could also restrict code generation to
> having only a finite number of frame pointer adjustments (an example
> of *not* having a finite number of `SP` adjustments is on AArch64 when
> untagging the stack (MTE) in some cases the compiler can modify `SP`
> in a loop).
The restriction on MTE is new to me as I don't know much about MTE yet.
>
> Having the CFI precise up to an instruction generally also means one
> cannot bundle together CFI instructions once the prologue is done,
> they need to be interspersed with ordinary instructions, which means
> extra `DW_CFA_advance_loc` commands, further increasing the unwind
> tables size.
>
> That is to say, async unwind tables impose a non-negligible overhead,
> yet for the most common use cases (like C++ exceptions), they are not
> even needed.
>
> We could, for example, extend the `uwtable` attribute with an optional
> value, e.g.
>   -  `uwtable` (default to 2)
>   -  `uwtable(1)`, sync unwind tables
>   -  `uwtable(2)`, async unwind tables
>   -  `uwtable(3)`, async unwind tables, but tracking only a subset of
> registers (e.g. CFA and return address)
>
> Or add a new attribute `async_uwtable`.
>
> Other suggestions? Comments?
I have thought about extending uwtable as well. In spirit the idea
looks great to me.
The mode removing most callee-saved registers is useful.
For example, I think linux-perf just uses pc/sp/fp (as how its ORC
unwinder is designed).

My slight concern with uwtable(3) is that the amount of unwind
information is not monotonic.
Since sync->async and the number of registers are two dimensions,
perhaps we should use two function attributes?
>
> ~chill
BTW, are you working on improving the general CFI problems for aarch64?
I tried to understand the implementation limitation in September (in
https://reviews.llvm.org/D109253) but then stopped.
If you have patches, I'll be happy to study them:)

I know there are quite problems like:

(a) .cfi_* directives in prologue are less precise

% cat a.c
void foo() {
  asm("" ::: "x23", "x24", "x25");
}
% clang --target=aarch64-linux-gnu a.c -S -o -
...
foo:                                    // @foo
        .cfi_startproc
// %bb.0:                               // %entry
        str     x25, [sp, #-32]!                // 8-byte Folded Spill
        stp     x24, x23, [sp, #16]             // 16-byte Folded Spill
        .cfi_def_cfa_offset 32   ////// should be immediately after
the pre-increment str
        .cfi_offset w23, -8
        .cfi_offset w24, -16
        .cfi_offset w25, -32
        //APP
        //NO_APP

(b) .cfi_* directives (for MachineInstr::FrameDestroy) in epilogue are
generally missing

(c) A basic block following an exit block may have wrong CFI
information (this can be fixed with .cfi_restore)

Most problems apply to all non-x86 targets.

---

Since we are discussing asynchronous unwind tables, may I ask two
slightly off-topic things?

(1) What's your opinion on ld --no-ld-generated-unwind-info?
Mine is
https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#no-ld-generated-unwind-info

(2) How should future stack unwinding strategy evolve?
Hardware assisted approach like leveraging shadow call stack?
Making FP more efficient so that user code can leverage
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer and drop
inefficient (both size and run-time performance) .eh_frame?

Last year I wrote a post
https://maskray.me/blog/2020-11-08-stack-unwinding as I learn stack
unwinding.
I am going to amend it to include my recent thoughts.

Eric Christopher via llvm-dev

2021-Nov-24 18:18 UTC

head link

[llvm-dev] [RFC] Asynchronous unwind tables attribute

Hi Momchil,

So, I think to elaborate from the thread you're looking at separating out:

no tables,
exception handling,
instruction level unwind accuracy

for unwind tables? Some examples of cases you expect to work and explicitly
not work in each of these would be fairly motivating. Going down the use
cases for each.

Thanks!

-eric

On Wed, Nov 17, 2021 at 6:19 AM Momchil Velikov via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On one hand, we have the `uwtable` attribute in LLVM IR, which tells
> whether to emit CFI directives. On the other hand, we have the `clang
> -cc1` command-line option `-funwind-tables=1|2 ` and the codegen
> option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or
> asynchronous unwind tables (2)`.
> Thus we lose along the way the information whether we want just some
> unwind tables or asynchronous unwind tables.
>
> Asynchronous unwind tables take more space in the runtime image, I'd
> estimate something like 80-90% more, as the difference is adding
> roughly the same number of CFI directives as for prologues, only a bit
> simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
> more, if you consider tail duplication of epilogue blocks.
> Asynchronous unwind tables could also restrict code generation to
> having only a finite number of frame pointer adjustments (an example
> of *not* having a finite number of `SP` adjustments is on AArch64 when
> untagging the stack (MTE) in some cases the compiler can modify `SP`
> in a loop).
> Having the CFI precise up to an instruction generally also means one
> cannot bundle together CFI instructions once the prologue is done,
> they need to be interspersed with ordinary instructions, which means
> extra `DW_CFA_advance_loc` commands, further increasing the unwind
> tables size.
>
> That is to say, async unwind tables impose a non-negligible overhead,
> yet for the most common use cases (like C++ exceptions), they are not
> even needed.
>
> We could, for example, extend the `uwtable` attribute with an optional
> value, e.g.
>   -  `uwtable` (default to 2)
>   -  `uwtable(1)`, sync unwind tables
>   -  `uwtable(2)`, async unwind tables
>   -  `uwtable(3)`, async unwind tables, but tracking only a subset of
> registers (e.g. CFA and return address)
>
> Or add a new attribute `async_uwtable`.
>
> Other suggestions? Comments?
>
> ~chill
>
> --
> Compiler scrub, Arm
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211124/e174b03b/attachment.html>

Momchil Velikov via llvm-dev

2021-Dec-09 14:56 UTC

head link

[llvm-dev] [RFC] Asynchronous unwind tables attribute

On Wed, 24 Nov 2021 at 18:18, Eric Christopher <echristo at gmail.com>
wrote:>
> Hi Momchil,
>
> So, I think to elaborate from the thread you're looking at separating
out:
>
> no tables,
> exception handling,
> instruction level unwind accuracy
>
> for unwind tables? Some examples of cases you expect to work andexplicitly not work in each of these would be fairly motivating. Going down
the use cases for each.

Not really. What I'm looking for is to convey the value of the CodeGen
option `UnwindTables` from clang to LLVM.

         | nounwind 0  |  nounwind 1
----------+-------------+--------------
uwtable 0 | <full,no>   |  <no,no>
----------+-------------+--------------
uwtable 1 | <full,no>   |  <full,no>
----------+-------------+--------------
uwtable 2 | <full,full> |  <full,full>

Lacking that, a backend can choose to generate unwind tables either  according
to the
second or the third rows, but a user has no control of it. As different
kinds of unwind
tables have different functionality and trade-offs, that should be
something under
user control.

~chill
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211209/97b3f805/attachment.html>

llvm dev - Nov 2021 - [RFC] Asynchronous unwind tables attribute

[llvm-dev] [RFC] Asynchronous unwind tables attribute

[llvm-dev] [RFC] Asynchronous unwind tables attribute

[llvm-dev] [RFC] Asynchronous unwind tables attribute

[llvm-dev] [RFC] Asynchronous unwind tables attribute