All,
The purpose of this email is to open a discussion on the design and
implementation of support for  Microsoft-style inline assembly in clang.  The
majority of the work will be done in clang, but I'm CC'ing both dev
lists to get
the most exposure possible.  There are _many_ open questions and the
forthcoming description is likely to be obscure in many place (at times this
is on purpose, while other times you'll just have to bear with me), but
hopefully
this will be a good starting point.
Clang currently supports the GNU-style inline assembly of the form:
  asm ( assembler template
           : output operands                     /* optional */
           : input operands                        /* optional */
           : list of clobbered registers      /* optional */
           );
When writing extended assembly, the user specifies the output operands, input
operands, constraints and a list of clobbered registers.  This largely
simplifies the
job of the compiler, but can be prone to user error.
Conversely, Microsoft-style inline assembly uses ordinary asm text without 
explicitly defining output operands, input operands, constraints and clobbered
registers; the compiler is responsible for discovering this information.
1. How is clang going to discover the input operands, output operands, 
constraints, and clobbered registers?
Support for parsing the Intel asm dialect exists in the X86AsmParser.  Ideally,
all the necessary information can be provided to us by the MC layer.  Thus,
given a valid asm statement that does not reference variable names, function
names, or labels
void foo(void) {
  __asm mov eax, eax
}
the AsmParser can match the asm statement to a MCInst.  From the MCInst we can
obtain the MCInstrDesc information, which is used to determine the number of 
operands, uses, defs and constraints of each asm statement.  This information is
also helpful for semantic checking.  Asm statements that reference variables
require special handling, however.
unsigned foo(void) {
  unsigned i = 1, j;
  __asm mov eax, i
  __asm mov j, eax
  return j;
}
In the above example, the two asm statments are not valid assembly.
Thus, the AsmParser cannot parse these statements without some modifications.
Type information, provided by Sema::LookupName(), is used to guess the
appropriate register class for patching the asm statements.  For example, the
first
asm statement could be modified as:
 mov eax, i -> mov eax, ecx
Here we're somewhat guessing that ecx is valid.  This isn't going to
work in
all cases, so we may need to probe the AsmParser with several alternatives.
It is yet to be determined how we handle the case of multiple matches.
3. Which syntax, AT&T or Intel, should be passed to the backend?
The approach taken by llvm-gcc is to translate the Intel syntax into the
AT&T
syntax in the front-end.  The advantage here is that the backend requires 
minimal changes.  However, I will argue that the asm should remain in the
Intel syntax and the backend should be modified to handle this.  This has a
number
of advantage: (1) it removes a great deal of complex textual processing from the
frontend, (2) is likely to provide better diagnostic information, and (3) 
emitting the Intel syntax is what the user expects.  The backend will need to
be taught the right syntax for operands (i.e. the address of a stack variable
is "DWORD PTR [8+ESP]" rather than "8(%esp)"), however.
4. How do we distinguish the assembly dialect at the IR level?
One approach would be wrap the asm text passed to the backend with .intel_syntax
and .att_syntax.  E.g.,
  call void asm sideeffect ".intel_syntax\n\tnop\n\t.att_syntax",
"~{dirflag},~{fpsr},~{flags}"() nounwind
Besides being rather hackish, this isn't likely going to work.  Based on
feedback
from other internal developers, I would like to propose a small IR extension.
Specifically, I would like to add a function attribute to inline asm calls that
indicates the syntax in which the asm is written.
E.g.,
  call void asm sideeffect "nop\n\t",
"~{dirflag},~{fpsr},~{flags}"() nounwind iasyntaxatt
Notice the addition of the iasyntaxatt keyword.  This is used by the AsmPrinter
to determine the correct asm variant.  I've implemented this change and
would be
happy to send it to the commits list for review.  However, the backend
doesn't currently
use this information so I don't think it would need to be committed at this
time.  I just
wanted to give everyone an idea of what I was thinking.
5. How should the inline asm be represented in the AST?
This is largely an open question and I'm still in the process of familiarize
myself with the frontend.  Doing this in a sane way is going to be critical for
providing good diagnostic information.
6. Additional comments:
One additional goal of this project is to implement support for MS-style inline
assembly in a target-independent way.  Thus, we should be able to use the same
inline assembly syntax for other architectures (e.g., ARM).
Hopefully, this gives everyone a _very_ high level view of the direction I would
like to
take for implementing MS-style inline assembly.  You feedback and suggestions
are _very_ welcome.
 Chad
> > 1. How is clang going to discover the input operands, output operands, > constraints, and clobbered registers? >The approach you described sounds good to me. Reusing all the work done in the LLVM MC layer seems the right approach.> In the above example, the two asm statments are not valid assembly. > Thus, the AsmParser cannot parse these statements without some > modifications. >I'm not familiar how it works, but maybe a generic hook can be added in AsmParser, so when unknown identifiers are parsed, Clang can provide the needed information. 3. Which syntax, AT&T or Intel, should be passed to the backend?>I'd prefer modifying the backend and passing the original syntax. Even if it involves some modifications, the advantages you listed are worth it.> 4. How do we distinguish the assembly dialect at the IR level? >I also agree about adding a new attribute to the IR. I'm not familiar with how IR metadata works, but instead of having specific syntax attributes ( iasyntaxatt), it would be cleaner to have a generic asmsyntax attribute that could take att/intel values. 5. How should the inline asm be represented in the AST?> > This is largely an open question and I'm still in the process of > familiarize > myself with the frontend. Doing this in a sane way is going to be > critical for > providing good diagnostic information. >At the moment it seems to be represented with a MSAsmStmt, which just provides a string representation of the assembly code. As we parse it, it would be good to have a proper AST for invidual assembly statements. Maybe we could represent them using a list of MCInst. Not sure how the MC layer works, but maybe we would have to fix the locations to match the original source code locations. -- João Matos -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120806/86dea6de/attachment.html>
On Aug 6, 2012, at 1:03 PM, João Matos wrote:> 1. How is clang going to discover the input operands, output operands, > constraints, and clobbered registers? > > The approach you described sounds good to me. Reusing all the work done in the LLVM MC layer seems the right approach.Glad you agree. We really shouldn't be replicating information in the frontend; this very error prone.> > > In the above example, the two asm statments are not valid assembly. > Thus, the AsmParser cannot parse these statements without some modifications. > > I'm not familiar how it works, but maybe a generic hook can be added in AsmParser, so when unknown identifiers are parsed, Clang can provide the needed information.My though here was that I wanted the fronted to be able to probe MC, but not necessarily the opposite. I want to avoid adding this kind of logic to the MC layer, but then again your suggestion might be the right approach. I was thinking it would be nice if the AsmParser understood pseudo registers (i.e., register that represent a register class and not necessarily a specific register). One problem with this approach, however, is that your allowing the AsmParser to accept invalid assembly. This could be fixed by adding a switch for when we're parsing this in the context of MS-style inline asm parsing vs. real assembly.> > > 3. Which syntax, AT&T or Intel, should be passed to the backend? > > I'd prefer modifying the backend and passing the original syntax. Even if it involves some modifications, the advantages you listed are worth it.Again, glad you agree. I don't think the modification will be that extensive.> > > 4. How do we distinguish the assembly dialect at the IR level? > > I also agree about adding a new attribute to the IR. I'm not familiar with how IR metadata works, but instead of having specific syntax attributes (iasyntaxatt), it would be cleaner to have a generic asmsyntax attribute that could take att/intel values.I don't think metadata would be the right approach as it would be rather heavy handed considering were only storing a bit or two of information. An attribute is the right way to go, so this is really a discussion about syntax. I think I prefer your syntax, but I'm not sure how this would be implemented. I'll investigate..> > > 5. How should the inline asm be represented in the AST? > > This is largely an open question and I'm still in the process of familiarize > myself with the frontend. Doing this in a sane way is going to be critical for > providing good diagnostic information. > > At the moment it seems to be represented with a MSAsmStmt, which just provides a string representation of the assembly code.This question concerns how we match the SourceLocations to the asm string/tokens. For the time being, I'm less concerned with good diagnostics and more concerned with getting the functionality working. I just wanted to point out that we should be thinking about this.. (but I've already got a lot on my plate :)> As we parse it, it would be good to have a proper AST for invidual assembly statements. Maybe we could represent them using a list of MCInst. Not sure how the MC layer works, but maybe we would have to fix the locations to match the original source code locations.I'm not sure we would want to store the MCInst(s) in the AST. After we're done with semantic checking and we've created the IR representation (with inputs, outputs, clobbers, constraints, etc.), I don't think there's a reason for the MCInst(s) to persist.> > > -- > João Matos-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120806/50ce9584/attachment.html>
On Aug 6, 2012, at 11:14 AM, Chad Rosier <mcrosier at apple.com> wrote:> Hopefully, this gives everyone a _very_ high level view of the direction I would like to > take for implementing MS-style inline assembly. You feedback and suggestions > are _very_ welcome.FWIW I'm in favor of the implementation laid out here and the patches so far. Thanks for the email Chad! -eric