thr3ads.net - llvm dev - [llvm-dev] [RFC] Checking inline assembly for validity [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Oliver Stannard via llvm-dev

2018-Nov-26 10:54 UTC

[llvm-dev] [RFC] Checking inline assembly for validity

GCC-style inline assembly is notoriously hard to write correctly, because it is
the user's responsibility to tell the compiler about the requirements of the
assembly (inputs, output, modified registers, memory access), and getting this
wrong results in silently generating incorrect code. This is also dependent on
register allocation and scheduling decisions made by the compiler, so an inline
assembly statement may appear to work correctly, then silently break when
another change to the code or compiler upgrade causes those decisions to
change.

I've posted a prototype patch at https://reviews.llvm.org/D54891 which tries
to
improve this situation by emitting diagnostics when the instructions inside the
inline assembly string do not match the operands to the inline assembly
statement. We can do this because we parse the assembly in the same process as
the compiler, and the MC layer has some knowledge of which registers are read
and written by an assembly instruction.

For example, this C code, which tries to add 3 integers together, looks OK at
first glance:

  int add3(int a, int b, int c) {
    int x;
    asm volatile(
        "add %0, %1, %2; add %0, %0, %3"
      : "=r" (x)
      : "r" (a), "r" (b), "r" (c));
    return x;
  }

However, the compiler is allowed to allocate x (the output operand) and c (an
input operand) to the same register, as it assumes that c will be read before x
is written. This code might happen to work correctly when first written, but a
later change (either in the source code or compiler) could cause register
allocation to change, and this will start silently generating the
"wrong" code.

With this patch, the compiler emits this warning for the above code:

  test.cpp:4:7: warning: read from an inline assembly operand after first
                         output operand written, suggest adding early-clobber
                         modifier to output operand [-Winline-asm]
        "add %0, %1, %2; add %0, %0, %3"
        ^
  <inline asm>:1:30: note: instantiated into assembly here
          add r0, r0, r1; add r0, r0, r2
                                      ^
  test.cpp:4:7: note: output operand written here
        "add %0, %1, %2; add %0, %0, %3"
        ^
  <inline asm>:1:6: note: instantiated into assembly here
          add r0, r0, r1; add r0, r0, r2
              ^

The warning is a bit noisy because it prints both the original source code and
final assembly code, and the carets for the original source just point to the
start of the line, but that's a separate issue.

I've designed this to be independent of register allocation decisions, so
the
above diagnostic is always reported, regardless of whether x and c were
allocated in the same register or not. This allows it to catch code which
currently works but might break in future.

The way this works is:
- When the AsmParser reaches an INLINEASM MachineInstr, it creates an
  InlineAsmDataflowChecker object, which tracks all of the information we need
  to know about one inline assembly statement.
- The AsmPrinter examines the operands to the MachineInstruction, and calls
  functions on the tracking object to record information about each operand to
  the inline assembly block, as provided by the user and relied on by the
  compiler's optimisations and code-generation.
- While the AsmParser is generating the final assembly string (which involves
  expanding operand templates like "$0" into physical register names),
it
  records the offset from the start of the (output) string at which each
  operand expansion appeared.
- The table-generated assembly matcher is modified to record the index of the
  MCParsedAsmOperand which resulted in in the creation of each MCOperand. An
  MCParsedAsmOperand can create multiple MCOperands (for example, a memory
  operand with base and offset), but not the other way round, so this
  information is stored in the MCOperand.
- When the AsmParser is running and a tracking object is present (it is only
  present if we are parsing inline assembly), it records in each ParsedOperand
  the indexes of the inline assembly statement operands which overlap with it.
  This is done using the source location of the just-parsed MCParsedAsmOperand
  and the string offsets recored in the tracking object by the AsmPrinter.
- Finally, after an instruction has been completely parsed, the AsmParser calls
  into the tracking object with the final MCInst and list of
  MCParsedAsmOperands. With all of this information, we can match up inline
  assembly operands to the MCOperands that were created for them, and check
  that they match.

The reason that I'm posting this as an RFC is that it adds a lot of coupling
between different parts of the backend. Do people think this is an acceptable
cost for making a quite user-hostile feature a bit safer? If people agree that
this is worthwhile, there are some things I still need to do before the patch
is ready for a proper review:

- Check the memory overhead (during regular compilation) of storing the parsed
  operand number in the MCOperand. If this is too high, it could be moved to a
  separate data structure only created when parsing inline assmebly.
- There are three different concepts of "operand" here (inline
assembly
  operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that these
  are a bit clearer.
- For checks which depend on the order of instructions (for example, checking
  that a clobbered register is only read from if it has previously been written
  to), these could give false positives if there is control-flow in the
  assembly. We could check if an MCInst affects control flow, buffer these
  diagnostics until the end of the assembly block, and only emit them if there
  are no control-flow instructions.

Reid Kleckner via llvm-dev

2018-Nov-26 20:36 UTC

head link

[llvm-dev] [RFC] Checking inline assembly for validity

I was following along until I got to the "way this works" part. :)

I agree with the premise 100%, writing correct inline assembly is a huge
pain. It would be nice to have some middle ground between having the user
carry the burden of writing correct constraints as in GCC inline asm, and
having the compiler guess the constraints as in MSVC inline asm.

The way I imagined this would work, before digging in too much, is that
we'd parse the GCC inline asm in the frontend, as is done with MS inline
asm, so you can get the warnings during semantic analysis. Generally,
GNU-as syntax and the GCC asm constraints should make this much, much
easier than it is for Intel syntax. The parser would parse "%N" as
some new
kind of MCOperand. We'd write generic MC checks that walk the MC
instruction stream and leverage InstrInfo to produce warnings.

That's pretty far from what you had in mind. Does that seem crazy? I think
it would help clean up a lot of the ad-hoc inline asm constraint warning
logic we have in clang, which has tons of target-specific knowledge. It
also seems like a good first step towards a less string-ly typed inline asm
instruction in LLVM IR, which is perhaps a bit out of scope.

On Mon, Nov 26, 2018 at 2:54 AM Oliver Stannard <Oliver.Stannard at
arm.com>
wrote:
> GCC-style inline assembly is notoriously hard to write correctly, because
> it is
> the user's responsibility to tell the compiler about the requirements
of
> the
> assembly (inputs, output, modified registers, memory access), and getting
> this
> wrong results in silently generating incorrect code. This is also
> dependent on
> register allocation and scheduling decisions made by the compiler, so an
> inline
> assembly statement may appear to work correctly, then silently break when
> another change to the code or compiler upgrade causes those decisions to
> change.
>
> I've posted a prototype patch at https://reviews.llvm.org/D54891 which
> tries to
> improve this situation by emitting diagnostics when the instructions
> inside the
> inline assembly string do not match the operands to the inline assembly
> statement. We can do this because we parse the assembly in the same
> process as
> the compiler, and the MC layer has some knowledge of which registers are
> read
> and written by an assembly instruction.
>
> For example, this C code, which tries to add 3 integers together, looks OK
> at
> first glance:
>
>   int add3(int a, int b, int c) {
>     int x;
>     asm volatile(
>         "add %0, %1, %2; add %0, %0, %3"
>       : "=r" (x)
>       : "r" (a), "r" (b), "r" (c));
>     return x;
>   }
>
> However, the compiler is allowed to allocate x (the output operand) and c
> (an
> input operand) to the same register, as it assumes that c will be read
> before x
> is written. This code might happen to work correctly when first written,
> but a
> later change (either in the source code or compiler) could cause register
> allocation to change, and this will start silently generating the
"wrong"
> code.
>
> With this patch, the compiler emits this warning for the above code:
>
>   test.cpp:4:7: warning: read from an inline assembly operand after first
>                          output operand written, suggest adding
> early-clobber
>                          modifier to output operand [-Winline-asm]
>         "add %0, %1, %2; add %0, %0, %3"
>         ^
>   <inline asm>:1:30: note: instantiated into assembly here
>           add r0, r0, r1; add r0, r0, r2
>                                       ^
>   test.cpp:4:7: note: output operand written here
>         "add %0, %1, %2; add %0, %0, %3"
>         ^
>   <inline asm>:1:6: note: instantiated into assembly here
>           add r0, r0, r1; add r0, r0, r2
>               ^
>
> The warning is a bit noisy because it prints both the original source code
> and
> final assembly code, and the carets for the original source just point to
> the
> start of the line, but that's a separate issue.
>
> I've designed this to be independent of register allocation decisions,
so
> the
> above diagnostic is always reported, regardless of whether x and c were
> allocated in the same register or not. This allows it to catch code which
> currently works but might break in future.
>
> The way this works is:
> - When the AsmParser reaches an INLINEASM MachineInstr, it creates an
>   InlineAsmDataflowChecker object, which tracks all of the information we
> need
>   to know about one inline assembly statement.
> - The AsmPrinter examines the operands to the MachineInstruction, and calls
>   functions on the tracking object to record information about each
> operand to
>   the inline assembly block, as provided by the user and relied on by the
>   compiler's optimisations and code-generation.
> - While the AsmParser is generating the final assembly string (which
> involves
>   expanding operand templates like "$0" into physical register
names), it
>   records the offset from the start of the (output) string at which each
>   operand expansion appeared.
> - The table-generated assembly matcher is modified to record the index of
> the
>   MCParsedAsmOperand which resulted in in the creation of each MCOperand.
> An
>   MCParsedAsmOperand can create multiple MCOperands (for example, a memory
>   operand with base and offset), but not the other way round, so this
>   information is stored in the MCOperand.
> - When the AsmParser is running and a tracking object is present (it is
> only
>   present if we are parsing inline assembly), it records in each
> ParsedOperand
>   the indexes of the inline assembly statement operands which overlap with
> it.
>   This is done using the source location of the just-parsed
> MCParsedAsmOperand
>   and the string offsets recored in the tracking object by the AsmPrinter.
> - Finally, after an instruction has been completely parsed, the AsmParser
> calls
>   into the tracking object with the final MCInst and list of
>   MCParsedAsmOperands. With all of this information, we can match up inline
>   assembly operands to the MCOperands that were created for them, and check
>   that they match.
>
> The reason that I'm posting this as an RFC is that it adds a lot of
> coupling
> between different parts of the backend. Do people think this is an
> acceptable
> cost for making a quite user-hostile feature a bit safer? If people agree
> that
> this is worthwhile, there are some things I still need to do before the
> patch
> is ready for a proper review:
>
> - Check the memory overhead (during regular compilation) of storing the
> parsed
>   operand number in the MCOperand. If this is too high, it could be moved
> to a
>   separate data structure only created when parsing inline assmebly.
> - There are three different concepts of "operand" here (inline
assembly
>   operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that
> these
>   are a bit clearer.
> - For checks which depend on the order of instructions (for example,
> checking
>   that a clobbered register is only read from if it has previously been
> written
>   to), these could give false positives if there is control-flow in the
>   assembly. We could check if an MCInst affects control flow, buffer these
>   diagnostics until the end of the assembly block, and only emit them if
> there
>   are no control-flow instructions.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/12c0b8d7/attachment.html>

James Y Knight via llvm-dev

2018-Nov-26 21:42 UTC

head link

[llvm-dev] [RFC] Checking inline assembly for validity

This sort of verification definitely seems interesting. However, many of
the checks you've added look like they'll have false positives in
perfectly
valid inline-asm. IMO, a warning for incorrect inline-asm is only going to
be useful if can avoid triggering on correct code -- it should trigger only
when it's can be _certain_ that there's an error.

Some examples of where this can go wrong in your current patch:
- This will warn if you write to a hardcoded register without having marked
it clobbered. But that's perfectly valid if you've first copied the old
value somewhere (which, I'll note, also requires reading the register which
is not declared as an input!), and then restore it before leaving the
inline-asm. You might be able to salvage this warning by only warning on a
write to an arbitrary register if there hasn't previously been a read of
that register (on the assumption that the read might have saved it).
- This triggers on store instructions without having either declared an
output memory operand or marked memory as clobbered. But it's valid to
store to memory, so long as you aren't writing to memory which is
accessible to the compiler. A special case of that is writing to the stack
(as you've noted in your test-case), but it applies just as well to other
inaccessible memory as well.
- Some instructions may not have their effects fully described, when
they're not generated by the compiler. In fact, some may not be _able_ to
be described (e.g. the syscall/trap/software-interrupt instruction might
read or write any registers). You'll probably need to just give up any
dataflow-sensitive analysis if you see one of these.
- As you noted, as soon as there's any control flow, you're giving
incorrect warnings. Again just giving up on any dataflow-sensitive warnings
might make sense here.

Also, what happens to the substitution-tracking and analysis when there's a
macro call (or even definition)? Is this getting data pre- or post-
assembly-macro expansion?

On Mon, Nov 26, 2018 at 5:55 AM Oliver Stannard via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> GCC-style inline assembly is notoriously hard to write correctly, because
> it is
> the user's responsibility to tell the compiler about the requirements
of
> the
> assembly (inputs, output, modified registers, memory access), and getting
> this
> wrong results in silently generating incorrect code. This is also
> dependent on
> register allocation and scheduling decisions made by the compiler, so an
> inline
> assembly statement may appear to work correctly, then silently break when
> another change to the code or compiler upgrade causes those decisions to
> change.
>
> I've posted a prototype patch at https://reviews.llvm.org/D54891 which
> tries to
> improve this situation by emitting diagnostics when the instructions
> inside the
> inline assembly string do not match the operands to the inline assembly
> statement. We can do this because we parse the assembly in the same
> process as
> the compiler, and the MC layer has some knowledge of which registers are
> read
> and written by an assembly instruction.
>
> For example, this C code, which tries to add 3 integers together, looks OK
> at
> first glance:
>
>   int add3(int a, int b, int c) {
>     int x;
>     asm volatile(
>         "add %0, %1, %2; add %0, %0, %3"
>       : "=r" (x)
>       : "r" (a), "r" (b), "r" (c));
>     return x;
>   }
>
> However, the compiler is allowed to allocate x (the output operand) and c
> (an
> input operand) to the same register, as it assumes that c will be read
> before x
> is written. This code might happen to work correctly when first written,
> but a
> later change (either in the source code or compiler) could cause register
> allocation to change, and this will start silently generating the
"wrong"
> code.
>
> With this patch, the compiler emits this warning for the above code:
>
>   test.cpp:4:7: warning: read from an inline assembly operand after first
>                          output operand written, suggest adding
> early-clobber
>                          modifier to output operand [-Winline-asm]
>         "add %0, %1, %2; add %0, %0, %3"
>         ^
>   <inline asm>:1:30: note: instantiated into assembly here
>           add r0, r0, r1; add r0, r0, r2
>                                       ^
>   test.cpp:4:7: note: output operand written here
>         "add %0, %1, %2; add %0, %0, %3"
>         ^
>   <inline asm>:1:6: note: instantiated into assembly here
>           add r0, r0, r1; add r0, r0, r2
>               ^
>
> The warning is a bit noisy because it prints both the original source code
> and
> final assembly code, and the carets for the original source just point to
> the
> start of the line, but that's a separate issue.
>
> I've designed this to be independent of register allocation decisions,
so
> the
> above diagnostic is always reported, regardless of whether x and c were
> allocated in the same register or not. This allows it to catch code which
> currently works but might break in future.
>
> The way this works is:
> - When the AsmParser reaches an INLINEASM MachineInstr, it creates an
>   InlineAsmDataflowChecker object, which tracks all of the information we
> need
>   to know about one inline assembly statement.
> - The AsmPrinter examines the operands to the MachineInstruction, and calls
>   functions on the tracking object to record information about each
> operand to
>   the inline assembly block, as provided by the user and relied on by the
>   compiler's optimisations and code-generation.
> - While the AsmParser is generating the final assembly string (which
> involves
>   expanding operand templates like "$0" into physical register
names), it
>   records the offset from the start of the (output) string at which each
>   operand expansion appeared.
> - The table-generated assembly matcher is modified to record the index of
> the
>   MCParsedAsmOperand which resulted in in the creation of each MCOperand.
> An
>   MCParsedAsmOperand can create multiple MCOperands (for example, a memory
>   operand with base and offset), but not the other way round, so this
>   information is stored in the MCOperand.
> - When the AsmParser is running and a tracking object is present (it is
> only
>   present if we are parsing inline assembly), it records in each
> ParsedOperand
>   the indexes of the inline assembly statement operands which overlap with
> it.
>   This is done using the source location of the just-parsed
> MCParsedAsmOperand
>   and the string offsets recored in the tracking object by the AsmPrinter.
> - Finally, after an instruction has been completely parsed, the AsmParser
> calls
>   into the tracking object with the final MCInst and list of
>   MCParsedAsmOperands. With all of this information, we can match up inline
>   assembly operands to the MCOperands that were created for them, and check
>   that they match.
>
> The reason that I'm posting this as an RFC is that it adds a lot of
> coupling
> between different parts of the backend. Do people think this is an
> acceptable
> cost for making a quite user-hostile feature a bit safer? If people agree
> that
> this is worthwhile, there are some things I still need to do before the
> patch
> is ready for a proper review:
>
> - Check the memory overhead (during regular compilation) of storing the
> parsed
>   operand number in the MCOperand. If this is too high, it could be moved
> to a
>   separate data structure only created when parsing inline assmebly.
> - There are three different concepts of "operand" here (inline
assembly
>   operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that
> these
>   are a bit clearer.
> - For checks which depend on the order of instructions (for example,
> checking
>   that a clobbered register is only read from if it has previously been
> written
>   to), these could give false positives if there is control-flow in the
>   assembly. We could check if an MCInst affects control flow, buffer these
>   diagnostics until the end of the assembly block, and only emit them if
> there
>   are no control-flow instructions.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/a757268a/attachment.html>

Oliver Stannard via llvm-dev

2018-Nov-27 13:40 UTC

head link

[llvm-dev] [RFC] Checking inline assembly for validity

Hi James,

I don't think it's going to be possible to avoid all false positives,
while
still detecting a reasonable amount of invalid code. I've tried to pick the
rules to balance the two, rather than sticking to one extreme, and that's
the
reason I've made these warnings rather than errors. That said, this is a
prototype, and there's obviously room for improvement.

I think it would be useful to allow the user to suppress these warnings, both
on a per-rule and per-statement basis. That's a bit hard to do with the
checks
happening in the backend, but maybe if Reid's suggestion of moving this into
clang works out, that would make this easier.
> This will warn if you write to a hardcoded register without having marked
it
> clobbered. But that's perfectly valid if you've first copied the
old value
> somewhere (which, I'll note, also requires reading the register which
is not
> declared as an input!), and then restore it before leaving the inline-asm.
> You might be able to salvage this warning by only warning on a write to an
> arbitrary register if there hasn't previously been a read of that
register
> (on the assumption that the read might have saved it).
Do you know of any cases where this is a useful pattern in real code? I'd
expect that it would be better to simply mark any used registers as clobbered,
and let the compiler save them if they happen to be live at that point.
> This triggers on store instructions without having either declared an
output
> memory operand or marked memory as clobbered. But it's valid to store
to
> memory, so long as you aren't writing to memory which is accessible to
the
> compiler. A special case of that is writing to the stack (as you've
noted in
> your test-case), but it applies just as well to other inaccessible memory
as
> well.
Maybe we could add a heuristic to allow memory accesses if the stack pointer is
written to, but I don't think there's any way to avoid the warning in
every
valid case without completely disabling it, thus failing to diagnose the common
mistakes.
> Some instructions may not have their effects fully described, when
they're
> not generated by the compiler. In fact, some may not be _able_ to be
> described (e.g. the syscall/trap/software-interrupt instruction might read
or
> write any registers). You'll probably need to just give up any
> dataflow-sensitive analysis if you see one of these.
Good point, we should probably assume that any register listed as clobbered or
a physical register operand is set by these instructions, so we don't emit
the
"reading an undefined register" warnings after them. I think we can
also add
call instructions to that list, as a regular function call could also write
registers, and might not even be using a standard calling convention.
> As you noted, as soon as there's any control flow, you're giving
incorrect
> warnings. Again just giving up on any dataflow-sensitive warnings might
make
> sense here.
Yes, I think we need to suppress some of the warnings if there is control flow
inside the assembly. I think the way to do this would be to buffer these
warnings until the end of the block, and only emit them if
MCInstrDesc::mayAffectControlFlow was false for every instruction.
> Also, what happens to the substitution-tracking and analysis when
there's a
> macro call (or even definition)? Is this getting data pre- or post-
> assembly-macro expansion?
This happens after assembly macro expansion, so it sees every instruction in
the order in which it gets sent to the MCStreamer.

Oliver

From: James Y Knight [mailto:jyknight at google.com]
Sent: 26 November 2018 21:42
To: Oliver Stannard
Cc: llvm-dev; nd
Subject: Re: [llvm-dev] [RFC] Checking inline assembly for validity

This sort of verification definitely seems interesting. However, many of the
checks you've added look like they'll have false positives in perfectly
valid inline-asm. IMO, a warning for incorrect inline-asm is only going to be
useful if can avoid triggering on correct code -- it should trigger only when
it's can be _certain_ that there's an error.

Some examples of where this can go wrong in your current patch:
- This will warn if you write to a hardcoded register without having marked it
clobbered. But that's perfectly valid if you've first copied the old
value somewhere (which, I'll note, also requires reading the register which
is not declared as an input!), and then restore it before leaving the
inline-asm. You might be able to salvage this warning by only warning on a write
to an arbitrary register if there hasn't previously been a read of that
register (on the assumption that the read might have saved it).
- This triggers on store instructions without having either declared an output
memory operand or marked memory as clobbered. But it's valid to store to
memory, so long as you aren't writing to memory which is accessible to the
compiler. A special case of that is writing to the stack (as you've noted in
your test-case), but it applies just as well to other inaccessible memory as
well.
- Some instructions may not have their effects fully described, when they're
not generated by the compiler. In fact, some may not be _able_ to be described
(e.g. the syscall/trap/software-interrupt instruction might read or write any
registers). You'll probably need to just give up any dataflow-sensitive
analysis if you see one of these.
- As you noted, as soon as there's any control flow, you're giving
incorrect warnings. Again just giving up on any dataflow-sensitive warnings
might make sense here.

Also, what happens to the substitution-tracking and analysis when there's a
macro call (or even definition)? Is this getting data pre- or post-
assembly-macro expansion?

On Mon, Nov 26, 2018 at 5:55 AM Oliver Stannard via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
GCC-style inline assembly is notoriously hard to write correctly, because it is
the user's responsibility to tell the compiler about the requirements of the
assembly (inputs, output, modified registers, memory access), and getting this
wrong results in silently generating incorrect code. This is also dependent on
register allocation and scheduling decisions made by the compiler, so an inline
assembly statement may appear to work correctly, then silently break when
another change to the code or compiler upgrade causes those decisions to
change.

I've posted a prototype patch at https://reviews.llvm.org/D54891 which tries
to
improve this situation by emitting diagnostics when the instructions inside the
inline assembly string do not match the operands to the inline assembly
statement. We can do this because we parse the assembly in the same process as
the compiler, and the MC layer has some knowledge of which registers are read
and written by an assembly instruction.

For example, this C code, which tries to add 3 integers together, looks OK at
first glance:

  int add3(int a, int b, int c) {
    int x;
    asm volatile(
        "add %0, %1, %2; add %0, %0, %3"
      : "=r" (x)
      : "r" (a), "r" (b), "r" (c));
    return x;
  }

However, the compiler is allowed to allocate x (the output operand) and c (an
input operand) to the same register, as it assumes that c will be read before x
is written. This code might happen to work correctly when first written, but a
later change (either in the source code or compiler) could cause register
allocation to change, and this will start silently generating the
"wrong" code.

With this patch, the compiler emits this warning for the above code:

  test.cpp:4:7: warning: read from an inline assembly operand after first
                         output operand written, suggest adding early-clobber
                         modifier to output operand [-Winline-asm]
        "add %0, %1, %2; add %0, %0, %3"
        ^
  <inline asm>:1:30: note: instantiated into assembly here
          add r0, r0, r1; add r0, r0, r2
                                      ^
  test.cpp:4:7: note: output operand written here
        "add %0, %1, %2; add %0, %0, %3"
        ^
  <inline asm>:1:6: note: instantiated into assembly here
          add r0, r0, r1; add r0, r0, r2
              ^

The warning is a bit noisy because it prints both the original source code and
final assembly code, and the carets for the original source just point to the
start of the line, but that's a separate issue.

I've designed this to be independent of register allocation decisions, so
the
above diagnostic is always reported, regardless of whether x and c were
allocated in the same register or not. This allows it to catch code which
currently works but might break in future.

The way this works is:
- When the AsmParser reaches an INLINEASM MachineInstr, it creates an
  InlineAsmDataflowChecker object, which tracks all of the information we need
  to know about one inline assembly statement.
- The AsmPrinter examines the operands to the MachineInstruction, and calls
  functions on the tracking object to record information about each operand to
  the inline assembly block, as provided by the user and relied on by the
  compiler's optimisations and code-generation.
- While the AsmParser is generating the final assembly string (which involves
  expanding operand templates like "$0" into physical register names),
it
  records the offset from the start of the (output) string at which each
  operand expansion appeared.
- The table-generated assembly matcher is modified to record the index of the
  MCParsedAsmOperand which resulted in in the creation of each MCOperand. An
  MCParsedAsmOperand can create multiple MCOperands (for example, a memory
  operand with base and offset), but not the other way round, so this
  information is stored in the MCOperand.
- When the AsmParser is running and a tracking object is present (it is only
  present if we are parsing inline assembly), it records in each ParsedOperand
  the indexes of the inline assembly statement operands which overlap with it.
  This is done using the source location of the just-parsed MCParsedAsmOperand
  and the string offsets recored in the tracking object by the AsmPrinter.
- Finally, after an instruction has been completely parsed, the AsmParser calls
  into the tracking object with the final MCInst and list of
  MCParsedAsmOperands. With all of this information, we can match up inline
  assembly operands to the MCOperands that were created for them, and check
  that they match.

The reason that I'm posting this as an RFC is that it adds a lot of coupling
between different parts of the backend. Do people think this is an acceptable
cost for making a quite user-hostile feature a bit safer? If people agree that
this is worthwhile, there are some things I still need to do before the patch
is ready for a proper review:

- Check the memory overhead (during regular compilation) of storing the parsed
  operand number in the MCOperand. If this is too high, it could be moved to a
  separate data structure only created when parsing inline assmebly.
- There are three different concepts of "operand" here (inline
assembly
  operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that these
  are a bit clearer.
- For checks which depend on the order of instructions (for example, checking
  that a clobbered register is only read from if it has previously been written
  to), these could give false positives if there is control-flow in the
  assembly. We could check if an MCInst affects control flow, buffer these
  diagnostics until the end of the assembly block, and only emit them if there
  are no control-flow instructions.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181127/3ad8e8a6/attachment-0001.html>

Oliver Stannard via llvm-dev

2018-Nov-27 16:00 UTC

head link

[llvm-dev] [RFC] Checking inline assembly for validity

Hi Reid,

I wrote this as part of the normal assembly parsing because I think it would be
very intrusive to modify the assembly parser to support parsing a raw inline
assembly string.

At least for ARM, many of the parsed operands are more complex than just a
register or immediate. For example, a memory operand has a base register, but
might also have an offset immediate or register, a shift, and/or and alignment.
We would also have to modify the expression parser to allow operand references
inside expressions, without needing to know if they are immediates or symbol
names.

I think this is a lot easier for MS-style inline assembly, because it only
works with a few simple operand types, whereas this checker needs to understand
all of the more complex operands (well, excluding the immediate-only ones).
That said, I'm not too familiar with how the MS-style inline assembly
support
works, so I'll investigate that a bit more to see if there's an easier
way to
do this.

My approach also has the advantage that it needs only very small changes in the
assembly parser, which is different for each architecture, so moving more of
the change into the assembly parser means more code needs to be re-written for
each backend we want this to work with. I think the target-specific code in my
prototype could be reduced to almost nothing if the assembly parser set
accurate start _and_ end locations for every operand, but unfortunately
that's
not currently the case, at least for ARM.

Oliver

From: Reid Kleckner [mailto:rnk at google.com]
Sent: 26 November 2018 20:36
To: Oliver Stannard
Cc: llvm-dev; nd; Renato Golin; Tim Northover; Kristof Beyls; Jim Grosbach; Eric
Christopher
Subject: Re: [RFC] Checking inline assembly for validity

I was following along until I got to the "way this works" part. :)

I agree with the premise 100%, writing correct inline assembly is a huge pain.
It would be nice to have some middle ground between having the user carry the
burden of writing correct constraints as in GCC inline asm, and having the
compiler guess the constraints as in MSVC inline asm.

The way I imagined this would work, before digging in too much, is that we'd
parse the GCC inline asm in the frontend, as is done with MS inline asm, so you
can get the warnings during semantic analysis. Generally, GNU-as syntax and the
GCC asm constraints should make this much, much easier than it is for Intel
syntax. The parser would parse "%N" as some new kind of MCOperand.
We'd write generic MC checks that walk the MC instruction stream and
leverage InstrInfo to produce warnings.

That's pretty far from what you had in mind. Does that seem crazy? I think
it would help clean up a lot of the ad-hoc inline asm constraint warning logic
we have in clang, which has tons of target-specific knowledge. It also seems
like a good first step towards a less string-ly typed inline asm instruction in
LLVM IR, which is perhaps a bit out of scope.

On Mon, Nov 26, 2018 at 2:54 AM Oliver Stannard <Oliver.Stannard at
arm.com<mailto:Oliver.Stannard at arm.com>> wrote:
GCC-style inline assembly is notoriously hard to write correctly, because it is
the user's responsibility to tell the compiler about the requirements of the
assembly (inputs, output, modified registers, memory access), and getting this
wrong results in silently generating incorrect code. This is also dependent on
register allocation and scheduling decisions made by the compiler, so an inline
assembly statement may appear to work correctly, then silently break when
another change to the code or compiler upgrade causes those decisions to
change.

I've posted a prototype patch at https://reviews.llvm.org/D54891 which tries
to
improve this situation by emitting diagnostics when the instructions inside the
inline assembly string do not match the operands to the inline assembly
statement. We can do this because we parse the assembly in the same process as
the compiler, and the MC layer has some knowledge of which registers are read
and written by an assembly instruction.

For example, this C code, which tries to add 3 integers together, looks OK at
first glance:

  int add3(int a, int b, int c) {
    int x;
    asm volatile(
        "add %0, %1, %2; add %0, %0, %3"
      : "=r" (x)
      : "r" (a), "r" (b), "r" (c));
    return x;
  }

However, the compiler is allowed to allocate x (the output operand) and c (an
input operand) to the same register, as it assumes that c will be read before x
is written. This code might happen to work correctly when first written, but a
later change (either in the source code or compiler) could cause register
allocation to change, and this will start silently generating the
"wrong" code.

With this patch, the compiler emits this warning for the above code:

  test.cpp:4:7: warning: read from an inline assembly operand after first
                         output operand written, suggest adding early-clobber
                         modifier to output operand [-Winline-asm]
        "add %0, %1, %2; add %0, %0, %3"
        ^
  <inline asm>:1:30: note: instantiated into assembly here
          add r0, r0, r1; add r0, r0, r2
                                      ^
  test.cpp:4:7: note: output operand written here
        "add %0, %1, %2; add %0, %0, %3"
        ^
  <inline asm>:1:6: note: instantiated into assembly here
          add r0, r0, r1; add r0, r0, r2
              ^

The warning is a bit noisy because it prints both the original source code and
final assembly code, and the carets for the original source just point to the
start of the line, but that's a separate issue.

I've designed this to be independent of register allocation decisions, so
the
above diagnostic is always reported, regardless of whether x and c were
allocated in the same register or not. This allows it to catch code which
currently works but might break in future.

The way this works is:
- When the AsmParser reaches an INLINEASM MachineInstr, it creates an
  InlineAsmDataflowChecker object, which tracks all of the information we need
  to know about one inline assembly statement.
- The AsmPrinter examines the operands to the MachineInstruction, and calls
  functions on the tracking object to record information about each operand to
  the inline assembly block, as provided by the user and relied on by the
  compiler's optimisations and code-generation.
- While the AsmParser is generating the final assembly string (which involves
  expanding operand templates like "$0" into physical register names),
it
  records the offset from the start of the (output) string at which each
  operand expansion appeared.
- The table-generated assembly matcher is modified to record the index of the
  MCParsedAsmOperand which resulted in in the creation of each MCOperand. An
  MCParsedAsmOperand can create multiple MCOperands (for example, a memory
  operand with base and offset), but not the other way round, so this
  information is stored in the MCOperand.
- When the AsmParser is running and a tracking object is present (it is only
  present if we are parsing inline assembly), it records in each ParsedOperand
  the indexes of the inline assembly statement operands which overlap with it.
  This is done using the source location of the just-parsed MCParsedAsmOperand
  and the string offsets recored in the tracking object by the AsmPrinter.
- Finally, after an instruction has been completely parsed, the AsmParser calls
  into the tracking object with the final MCInst and list of
  MCParsedAsmOperands. With all of this information, we can match up inline
  assembly operands to the MCOperands that were created for them, and check
  that they match.

The reason that I'm posting this as an RFC is that it adds a lot of coupling
between different parts of the backend. Do people think this is an acceptable
cost for making a quite user-hostile feature a bit safer? If people agree that
this is worthwhile, there are some things I still need to do before the patch
is ready for a proper review:

- Check the memory overhead (during regular compilation) of storing the parsed
  operand number in the MCOperand. If this is too high, it could be moved to a
  separate data structure only created when parsing inline assmebly.
- There are three different concepts of "operand" here (inline
assembly
  operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that these
  are a bit clearer.
- For checks which depend on the order of instructions (for example, checking
  that a clobbered register is only read from if it has previously been written
  to), these could give false positives if there is control-flow in the
  assembly. We could check if an MCInst affects control flow, buffer these
  diagnostics until the end of the assembly block, and only emit them if there
  are no control-flow instructions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181127/5c226137/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2018 - [RFC] Checking inline assembly for validity

[llvm-dev] [RFC] Checking inline assembly for validity

[llvm-dev] [RFC] Checking inline assembly for validity

[llvm-dev] [RFC] Checking inline assembly for validity

[llvm-dev] [RFC] Checking inline assembly for validity

[llvm-dev] [RFC] Checking inline assembly for validity

Apparently Analagous Threads