thr3ads.net - llvm dev - [LLVMdev] RFC building a target MCAsmParser [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Colin LeMahieu

2015-Apr-14 17:58 UTC

[LLVMdev] RFC building a target MCAsmParser

Hi everyone.  We're interested in contributing a Hexagon assembler to MC and
we're looking for comments on a good way to integrate the grammar in to the
infrastructure.

 

We rely on having a robust assembler because we have a large base of
developers that write in assembly due to low power requirements for mobile
devices.  We put in some C-like concepts to make the syntax easier and this
design is fairly well received by users.

 

The following is a list of grammar snippets we've had trouble integrating in
to the asm parser framework.

 

Instruction packets are optionally enclosed in braces.

    { r0 = add(r1, r2) r1 = add(r2, r0) }

 

Register can be the beginning of a statement.  Register transfers have no
mnemonic.

    r0 = r1

 

Double registers have a colon in the middle which can look like a label

    r1:0 = add(r3:2, r5:4)

 

Predicated variants for many instructions

    if(p1) r0 = add(r1, r2)

 

Dense semantics for DSP applications.  Complex multiply optionally shifting
result left by 1 with optional rounding and optional saturation

    r0 = cmpy(r1, r2):<<1:rnd:sat

 

Hardware loops ended by optional packet suffix

    { r0 = r1 }:endloop0:endloop1

 

We found the Hexagon grammar to be straight forward to implement using plain
lex / parse but harder within the MCTargetAsmParser. 

 

We were thinking a way to get the grammar to work would involve modifying
tablegen and the main asm parser loop.  We'd have to make tablegen break
down each instructions in to a sequence of tokens and build a sorted
matching table based on the set of these sequences.  The matching loop would
bisect this sorted list looking for a match.  We think existing grammars
would be unaffected; all existing instructions start with a mnemonic so
their first token would be an identifier followed by the same sequence of
tokens they currently have.

 

Let us know if we're likely to run in to any issues making these changes or
if there are other recommendations on what we could do.  Thanks!

 

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150414/5a66ff5a/attachment.html>

Reid Kleckner

2015-Apr-14 18:16 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

Frankly, the MC assembler infrastructure is pretty weak. I've spoken
personally to some of the developers who wrote it, and they said they know
it's bad and suggested that it be rewritten. My experience from attempting
to make MC parse Intel x86 asm confirms this.

Here's some questions for you:
Do you need a high-level macro assembler designed for human consumption?
Do you need Clang to understand inline assembly that uses these high-level
assembly constructs?
Do you need the assembler to parse the output of `clang -S`?

It might be best to have your own small frontend for the macro assembler
language that produces MCInsts which can be printed using a more
traditional gas-like mneumonic. Running llvm-mc on a hexagon assembly file
and re-emitting assembly would parse the high-level assembly and produce
the low-level assembly.

Alternatively, you can do as you say and try to make MC into more of a real
language frontend. The major problem is that lexing rules are not the same
across all targets, even when they all use a gas-like syntax.

On Tue, Apr 14, 2015 at 10:58 AM, Colin LeMahieu <colinl at
codeaurora.org>
wrote:
> Hi everyone.  We’re interested in contributing a Hexagon assembler to MC
> and we’re looking for comments on a good way to integrate the grammar in to
> the infrastructure.
>
>
>
> We rely on having a robust assembler because we have a large base of
> developers that write in assembly due to low power requirements for mobile
> devices.  We put in some C-like concepts to make the syntax easier and this
> design is fairly well received by users.
>
>
>
> The following is a list of grammar snippets we’ve had trouble integrating
> in to the asm parser framework.
>
>
>
> Instruction packets are optionally enclosed in braces.
>
>     { r0 = add(r1, r2) r1 = add(r2, r0) }
>
>
>
> Register can be the beginning of a statement.  Register transfers have no
> mnemonic.
>
>     r0 = r1
>
>
>
> Double registers have a colon in the middle which can look like a label
>
>     r1:0 = add(r3:2, r5:4)
>
>
>
> Predicated variants for many instructions
>
>     if(p1) r0 = add(r1, r2)
>
>
>
> Dense semantics for DSP applications.  Complex multiply optionally
> shifting result left by 1 with optional rounding and optional saturation
>
>     r0 = cmpy(r1, r2):<<1:rnd:sat
>
>
>
> Hardware loops ended by optional packet suffix
>
>     { r0 = r1 }:endloop0:endloop1
>
>
>
> We found the Hexagon grammar to be straight forward to implement using
> plain lex / parse but harder within the MCTargetAsmParser.
>
>
>
> We were thinking a way to get the grammar to work would involve modifying
> tablegen and the main asm parser loop.  We’d have to make tablegen break
> down each instructions in to a sequence of tokens and build a sorted
> matching table based on the set of these sequences.  The matching loop
> would bisect this sorted list looking for a match.  We think existing
> grammars would be unaffected; all existing instructions start with a
> mnemonic so their first token would be an identifier followed by the same
> sequence of tokens they currently have.
>
>
>
> Let us know if we’re likely to run in to any issues making these changes
> or if there are other recommendations on what we could do.  Thanks!
>
>
>
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150414/0503ef03/attachment.html>

Joerg Sonnenberger

2015-Apr-14 18:21 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

On Tue, Apr 14, 2015 at 12:58:51PM -0500, Colin LeMahieu
wrote:> We found the Hexagon grammar to be straight forward to implement using
plain
> lex / parse but harder within the MCTargetAsmParser. 
Can you go into more detail what exactly you find difficult? X86 already
has the problem of integrating two quite different assembler variants.
Your grammar also seems to lack a number of things like how to specify
sections and similar meta data?

Joerg

Krzysztof Parzyszek

2015-Apr-14 18:43 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

On 4/14/2015 1:21 PM, Joerg Sonnenberger wrote:> On Tue, Apr 14, 2015 at 12:58:51PM -0500, Colin LeMahieu wrote:
>> We found the Hexagon grammar to be straight forward to implement using
plain
>> lex / parse but harder within the MCTargetAsmParser.
>
> Can you go into more detail what exactly you find difficult? X86 already
> has the problem of integrating two quite different assembler variants.
> Your grammar also seems to lack a number of things like how to specify
> sections and similar meta data?
Hexagon assembly syntax doesn't have mnemonics in the same sense as 
other architectures.  Take "add" as an example:

   r0 = add(r1, r2)
   r0 = add(r1, #2)
r1:0 = add(r3:2, r5:4)
r1:0 = add(r2, r5:4)

There is more:

   r0  = add(r1, add(r2, #5))
r1:0  = add(r3:2, r5:4):sat
   r0 += add(r1, #3)

predicated versions:

   if (!p0) r0 = add(r1, #4)

etc.

In the architecture definition they all have different tags, which we 
use in the .td files, but for the assembly programmer there is "add" 
stuck in various places in the instruction string.  Most architectures 
use the syntax where the mnemonic comes first, followed by operands. 
Ours is very different.

-Krzysztof



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Colin LeMahieu

2015-Apr-14 19:00 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

Sure, the first immediate problem we run in to is statements not beginning
with mnemonics.  The assembly syntax uses C-like assignments so most
statements start with a register token, then an equals token, and then the
mnemonic followed by the remaining tokens.

The next issue is with creating instruction bundles.  We enclose a certain
number of instructions in curly braces to signify packet begin/end so this
doesn't fit in to the one line = one instruction rule.

The way to describe sections and metadata is defined though we didn't have
many problems so they're omitted.

I see the X86 target has a matcher for each syntax.  Were there other things
in the grammar that were hard to integrate cleanly?

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Joerg Sonnenberger
Sent: Tuesday, April 14, 2015 1:21 PM
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] RFC building a target MCAsmParser

On Tue, Apr 14, 2015 at 12:58:51PM -0500, Colin LeMahieu
wrote:> We found the Hexagon grammar to be straight forward to implement using 
> plain lex / parse but harder within the MCTargetAsmParser.
Can you go into more detail what exactly you find difficult? X86 already has
the problem of integrating two quite different assembler variants.
Your grammar also seems to lack a number of things like how to specify
sections and similar meta data?

Joerg
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Colin LeMahieu

2015-Apr-14 19:23 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

An assembler frontend for converting to MCInsts is a possibility.  The parser
for that would probably be generated using a third party parser generator.  At
that point we could just substitute use of the TableGen match tables for the
generated library.  Do you know if anyone’s considered including a parser lib
like this before?

 

The TableGen changes I alluded to would get us close to parsing without hacks
though it would still be deficient compared to a full parser.  These changes
would probably be the least amount of deviation from other targets depending on
what everyone wants long-term.

 

From: Reid Kleckner [mailto:rnk at google.com] 
Sent: Tuesday, April 14, 2015 1:17 PM
To: Colin LeMahieu
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] RFC building a target MCAsmParser

 

Frankly, the MC assembler infrastructure is pretty weak. I've spoken
personally to some of the developers who wrote it, and they said they know
it's bad and suggested that it be rewritten. My experience from attempting
to make MC parse Intel x86 asm confirms this.

 

Here's some questions for you:

Do you need a high-level macro assembler designed for human consumption?
Do you need Clang to understand inline assembly that uses these high-level
assembly constructs?

Do you need the assembler to parse the output of `clang -S`?

 

It might be best to have your own small frontend for the macro assembler
language that produces MCInsts which can be printed using a more traditional
gas-like mneumonic. Running llvm-mc on a hexagon assembly file and re-emitting
assembly would parse the high-level assembly and produce the low-level assembly.

 

Alternatively, you can do as you say and try to make MC into more of a real
language frontend. The major problem is that lexing rules are not the same
across all targets, even when they all use a gas-like syntax.

 

On Tue, Apr 14, 2015 at 10:58 AM, Colin LeMahieu <colinl at codeaurora.org
<mailto:colinl at codeaurora.org> > wrote:

Hi everyone.  We’re interested in contributing a Hexagon assembler to MC and
we’re looking for comments on a good way to integrate the grammar in to the
infrastructure.

 

We rely on having a robust assembler because we have a large base of developers
that write in assembly due to low power requirements for mobile devices.  We put
in some C-like concepts to make the syntax easier and this design is fairly well
received by users.

 

The following is a list of grammar snippets we’ve had trouble integrating in to
the asm parser framework.

 

Instruction packets are optionally enclosed in braces.

    { r0 = add(r1, r2) r1 = add(r2, r0) }

 

Register can be the beginning of a statement.  Register transfers have no
mnemonic.

    r0 = r1

 

Double registers have a colon in the middle which can look like a label

    r1:0 = add(r3:2, r5:4)

 

Predicated variants for many instructions

    if(p1) r0 = add(r1, r2)

 

Dense semantics for DSP applications.  Complex multiply optionally shifting
result left by 1 with optional rounding and optional saturation

    r0 = cmpy(r1, r2):<<1:rnd:sat

 

Hardware loops ended by optional packet suffix

    { r0 = r1 }:endloop0:endloop1

 

We found the Hexagon grammar to be straight forward to implement using plain lex
/ parse but harder within the MCTargetAsmParser.

 

We were thinking a way to get the grammar to work would involve modifying
tablegen and the main asm parser loop.  We’d have to make tablegen break down
each instructions in to a sequence of tokens and build a sorted matching table
based on the set of these sequences.  The matching loop would bisect this sorted
list looking for a match.  We think existing grammars would be unaffected; all
existing instructions start with a mnemonic so their first token would be an
identifier followed by the same sequence of tokens they currently have.

 

Let us know if we’re likely to run in to any issues making these changes or if
there are other recommendations on what we could do.  Thanks!

 

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

 


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150414/311f2ec8/attachment.html>

Colin LeMahieu

2015-Apr-15 19:22 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

One possibility for which we'd be interested in getting feedback is allowing
a target to fully handle the parsing process.

 

We have a generated parser that can output other compiler IRs and this could
be changed to output MCInsts.  If we could get the input text stream and an
output MC stream we could have a target specific way of doing all parsing.
Perhaps this would be useful to other targets that have difficulty?

 

A parser generator isn't distributed with the project so we could publish
the parser generator input and output.

 

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Colin LeMahieu
Sent: Tuesday, April 14, 2015 12:59 PM
To: 'LLVM Developers Mailing List'
Subject: [LLVMdev] RFC building a target MCAsmParser

 

Hi everyone.  We're interested in contributing a Hexagon assembler to MC and
we're looking for comments on a good way to integrate the grammar in to the
infrastructure.

 

We rely on having a robust assembler because we have a large base of
developers that write in assembly due to low power requirements for mobile
devices.  We put in some C-like concepts to make the syntax easier and this
design is fairly well received by users.

 

The following is a list of grammar snippets we've had trouble integrating in
to the asm parser framework.

 

Instruction packets are optionally enclosed in braces.

    { r0 = add(r1, r2) r1 = add(r2, r0) }

 

Register can be the beginning of a statement.  Register transfers have no
mnemonic.

    r0 = r1

 

Double registers have a colon in the middle which can look like a label

    r1:0 = add(r3:2, r5:4)

 

Predicated variants for many instructions

    if(p1) r0 = add(r1, r2)

 

Dense semantics for DSP applications.  Complex multiply optionally shifting
result left by 1 with optional rounding and optional saturation

    r0 = cmpy(r1, r2):<<1:rnd:sat

 

Hardware loops ended by optional packet suffix

    { r0 = r1 }:endloop0:endloop1

 

We found the Hexagon grammar to be straight forward to implement using plain
lex / parse but harder within the MCTargetAsmParser. 

 

We were thinking a way to get the grammar to work would involve modifying
tablegen and the main asm parser loop.  We'd have to make tablegen break
down each instructions in to a sequence of tokens and build a sorted
matching table based on the set of these sequences.  The matching loop would
bisect this sorted list looking for a match.  We think existing grammars
would be unaffected; all existing instructions start with a mnemonic so
their first token would be an identifier followed by the same sequence of
tokens they currently have.

 

Let us know if we're likely to run in to any issues making these changes or
if there are other recommendations on what we could do.  Thanks!

 

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150415/52316564/attachment.html>

Tom Stellard

2015-Apr-15 19:58 UTC

head link

[LLVMdev] RFC building a target MCAsmParser

On Wed, Apr 15, 2015 at 02:22:54PM -0500, Colin LeMahieu
wrote:> One possibility for which we'd be interested in getting feedback is
allowing
> a target to fully handle the parsing process.
> 
How many of the problems that you have encountered come from the c++ code that
is generated by TableGen?  If you ignore all the Tablegen'd code, does the
MCAssembler interface give you enough freedom to do what you want?

-Tom
>  
> 
> We have a generated parser that can output other compiler IRs and this
could
> be changed to output MCInsts.  If we could get the input text stream and an
> output MC stream we could have a target specific way of doing all parsing.
> Perhaps this would be useful to other targets that have difficulty?
> 
>  
> 
> A parser generator isn't distributed with the project so we could
publish
> the parser generator input and output.
> 
>  
> 
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On
> Behalf Of Colin LeMahieu
> Sent: Tuesday, April 14, 2015 12:59 PM
> To: 'LLVM Developers Mailing List'
> Subject: [LLVMdev] RFC building a target MCAsmParser
> 
>  
> 
> Hi everyone.  We're interested in contributing a Hexagon assembler to
MC and
> we're looking for comments on a good way to integrate the grammar in to
the
> infrastructure.
> 
>  
> 
> We rely on having a robust assembler because we have a large base of
> developers that write in assembly due to low power requirements for mobile
> devices.  We put in some C-like concepts to make the syntax easier and this
> design is fairly well received by users.
> 
>  
> 
> The following is a list of grammar snippets we've had trouble
integrating in
> to the asm parser framework.
> 
>  
> 
> Instruction packets are optionally enclosed in braces.
> 
>     { r0 = add(r1, r2) r1 = add(r2, r0) }
> 
>  
> 
> Register can be the beginning of a statement.  Register transfers have no
> mnemonic.
> 
>     r0 = r1
> 
>  
> 
> Double registers have a colon in the middle which can look like a label
> 
>     r1:0 = add(r3:2, r5:4)
> 
>  
> 
> Predicated variants for many instructions
> 
>     if(p1) r0 = add(r1, r2)
> 
>  
> 
> Dense semantics for DSP applications.  Complex multiply optionally shifting
> result left by 1 with optional rounding and optional saturation
> 
>     r0 = cmpy(r1, r2):<<1:rnd:sat
> 
>  
> 
> Hardware loops ended by optional packet suffix
> 
>     { r0 = r1 }:endloop0:endloop1
> 
>  
> 
> We found the Hexagon grammar to be straight forward to implement using
plain
> lex / parse but harder within the MCTargetAsmParser. 
> 
>  
> 
> We were thinking a way to get the grammar to work would involve modifying
> tablegen and the main asm parser loop.  We'd have to make tablegen
break
> down each instructions in to a sequence of tokens and build a sorted
> matching table based on the set of these sequences.  The matching loop
would
> bisect this sorted list looking for a match.  We think existing grammars
> would be unaffected; all existing instructions start with a mnemonic so
> their first token would be an identifier followed by the same sequence of
> tokens they currently have.
> 
>  
> 
> Let us know if we're likely to run in to any issues making these
changes or
> if there are other recommendations on what we could do.  Thanks!
> 
>  
> 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
> a Linux Foundation Collaborative Project
> 
>  
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Apr 2015 - [LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

[LLVMdev] RFC building a target MCAsmParser

Seemingly Similar Threads