thr3ads.net - llvm dev - [LLVMdev] Hexagon Assembly parser question [Oct 2012]

If this information is useful, please help other people find it:
Share via:

David Young

2012-Oct-17 22:29 UTC

[LLVMdev] Hexagon Assembly parser question

Hi,

I'm trying to enable the hexagon LLVM  assembly parser.   It seem like there
is a lot of work that has been done to make this parsing straightforward.  

 

But..

Hexagon assembly does not follow the "Mnemonic Rx Rx ." format that is
expected by the assembly parsing infrastructure, represented by:

StringRef Mnemonic = ((ARMOperand*)Operands[0])->getToken();

 

This Mnemonic location  assumption applies to both the Tablegen Backend
AsmMatcherEmitter processing, and the .inc file it produces where
MatchInstructionImpl is the entry point by which the assembly input is
parsed. 

 

However Hexagon assembly has some features that make it more readable, such
as r1 = r2, or if(r1) r2 = mem(#immediate).  This makes taking advantage of
the existing LLVM code difficult.

 

 

Currently, I see two options.  

 

One is to preformat the assembly string(s) obtained from the td files so
that it is matches the format that the tablegen backend expects, and also
preformat the assembly input so that it can be matched against the
preformatted assembly strings.

 

The other is to write a whole new TD backend that doesn't rely on the
Mnemonic location assumption.  And hope someday to merge this backend with
the current AsmMatcherEmitter. 

 

I am leaning toward the latter.  The other seems like it will create many
more problems in the long run.   Any thoughts, comments, or recommended
directions are appreciated.

 

Regards,

David

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121017/d04adc14/attachment.html>

Jim Grosbach

2012-Oct-17 23:55 UTC

head link

[LLVMdev] Hexagon Assembly parser question

On Oct 17, 2012, at 3:29 PM, David Young <davidy at codeaurora.org> wrote:
> Hi,
> I’m trying to enable the hexagon LLVM  assembly parser.   It seem like
there is a lot of work that has been done to make this parsing straightforward.
>  
> But….
> Hexagon assembly does not follow the “Mnemonic Rx Rx …” format that is
expected by the assembly parsing infrastructure, represented by:
> StringRef Mnemonic = ((ARMOperand*)Operands[0])->getToken();
>  
> This Mnemonic location  assumption applies to both the Tablegen Backend
AsmMatcherEmitter processing, and the .inc file it produces where
MatchInstructionImpl is the entry point by which the assembly input is parsed.
>  
> However Hexagon assembly has some features that make it more readable, such
as r1 = r2, or if(r1) r2 = mem(#immediate).  This makes taking advantage of the
existing LLVM code difficult.
>  
>  
> Currently, I see two options. 
>  
> One is to preformat the assembly string(s) obtained from the td files so
that it is matches the format that the tablegen backend expects, and also
preformat the assembly input so that it can be matched against the preformatted
assembly strings.
I agree this sounds pretty hacky.
>  
> The other is to write a whole new TD backend that doesn’t rely on the
Mnemonic location assumption.  And hope someday to merge this backend with the
current AsmMatcherEmitter.
The table is sorted by mnemonic (more abstractly, by operator). That's
pretty fundamental to how it works, so sticking with that would be good unless
you want to write an entirely new algorithm. You could probably stick with the
current basic stuff with some fiddling in tablegen where the asm string gets
split apart when building up matchables to re-order things appropriately. Then
your ParseInstruction() implementation would do similar tricks. The printer
should "just work," thankfully.

That said, you'll also likely have to do a bit of work in the generic
AsmParser code, as it'll likely look at statements like these and not
realize they're instruction sequences. The "mnemonic <whitespace>
operands" format is pretty strongly imprinted on everything. That's not
completely unfixable, of course, but it may be a bit tricky to avoid syntactic
ambiguities. Not impossible, mind, just tricky and something to pay very close
attention to in your design.

-Jim
>  
> I am leaning toward the latter.  The other seems like it will create many
more problems in the long run.   Any thoughts, comments, or recommended
directions are appreciated.
>  
> Regards,
> David
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121017/de172733/attachment.html>

David Young

2012-Oct-18 03:47 UTC

head link

[LLVMdev] Hexagon Assembly parser question

Hi Jim,

Thanks for the quick feedback!

 

I have 2 thoughts along the direction that you suggest.

 

Currently the parser seems to be sorting/searching the instruction table by
Mnemonic as the first pass.  Is it possible to sort by ConvertFn, and add a
generic mnemonic field to this?  Then we could convert the assembly parser
to search first for the format of the instruction (after first parsing out
the formatting fields), and then see if the format is supported by the
mnemonic.

 

Just for my own reference, the Table elements are defined:

 

static const char *const MnemonicTable;

    uint32_t Mnemonic;

    uint16_t Opcode;

    uint16_t ConvertFn;

    uint32_t RequiredFeatures;

    uint16_t Classes[18];

 

This may benefit the matching generally by evening out the what the
std::equal_range function must search through, and may make my task easier
since there is no clearly defined mnemonics in much of Hexagon assembly
grammar.

 

The other thought  is whether or not we night be able to use stl::map
(ordered tree) or std::tr1::unordered_map for a hash implementation when
doing the matching.  It seems that most of the hard work of turning the
format into a ConvertFn index has already been done by the current
implementation.   Maybe a few Hashes can get you to a match directly?

 

Sorry for any dumb questions.  I'm a LLVM newbie.

 

Regards,

David

 

 

 

 

 

From: Jim Grosbach [mailto:grosbach at apple.com] 
Sent: Wednesday, October 17, 2012 6:55 PM
To: David Young
Cc: 'LLVM Developers Mailing List'
Subject: Re: [LLVMdev] Hexagon Assembly parser question

 

 

On Oct 17, 2012, at 3:29 PM, David Young <davidy at codeaurora.org> wrote:





Hi,

I'm trying to enable the hexagon LLVM  assembly parser.   It seem like there
is a lot of work that has been done to make this parsing straightforward. 

 

But..

Hexagon assembly does not follow the "Mnemonic Rx Rx ." format that is
expected by the assembly parsing infrastructure, represented by:

StringRef Mnemonic = ((ARMOperand*)Operands[0])->getToken();

 

This Mnemonic location  assumption applies to both the Tablegen Backend
AsmMatcherEmitter processing, and the .inc file it produces where
MatchInstructionImpl is the entry point by which the assembly input is
parsed.

 

However Hexagon assembly has some features that make it more readable, such
as r1 = r2, or if(r1) r2 = mem(#immediate).  This makes taking advantage of
the existing LLVM code difficult.

 

 

Currently, I see two options. 

 

One is to preformat the assembly string(s) obtained from the td files so
that it is matches the format that the tablegen backend expects, and also
preformat the assembly input so that it can be matched against the
preformatted assembly strings.

 

I agree this sounds pretty hacky.





 

The other is to write a whole new TD backend that doesn't rely on the
Mnemonic location assumption.  And hope someday to merge this backend with
the current AsmMatcherEmitter.

 

The table is sorted by mnemonic (more abstractly, by operator). That's
pretty fundamental to how it works, so sticking with that would be good
unless you want to write an entirely new algorithm. You could probably stick
with the current basic stuff with some fiddling in tablegen where the asm
string gets split apart when building up matchables to re-order things
appropriately. Then your ParseInstruction() implementation would do similar
tricks. The printer should "just work," thankfully.

 

That said, you'll also likely have to do a bit of work in the generic
AsmParser code, as it'll likely look at statements like these and not
realize they're instruction sequences. The "mnemonic <whitespace>
operands"
format is pretty strongly imprinted on everything. That's not completely
unfixable, of course, but it may be a bit tricky to avoid syntactic
ambiguities. Not impossible, mind, just tricky and something to pay very
close attention to in your design.

 

-Jim





 

I am leaning toward the latter.  The other seems like it will create many
more problems in the long run.   Any thoughts, comments, or recommended
directions are appreciated.

 

Regards,

David

_______________________________________________
LLVM Developers mailing list
 <mailto:LLVMdev at cs.uiuc.edu> LLVMdev at cs.uiuc.edu
<http://llvm.cs.uiuc.edu> http://llvm.cs.uiuc.edu
 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121017/1ea10aaa/attachment.html>

David Young

2012-Nov-25 19:52 UTC

head link

[LLVMdev] Hexagon Assembly parser question

Hi Jim,

 

I am now able to parse out the mnemonic from the instruction table, and
create the .inc file.  I am currently doing this by searching for the
mnemonic based on instruction's structure.  Not quite ready to get the code
reviewed yet, but enough to move forward.

 

I'm now stuck on the second part of the email that you sent, in the
Asmparser.

 

In particular, your statement has been very prescient:

That said, you'll also likely have to do a bit of work in the generic
AsmParser code, as it'll likely look at statements like these and not
realize they're instruction sequences. The "mnemonic <whitespace>
operands"
format is pretty strongly imprinted on everything.

 

For Hexagon's case, statements like this 

r0 = ##.L.str

r1 = #0

r0 = r1  

Are errors have issues because they are being parsed as assignments:

Line 1206: AsmParser.cpp

  case AsmToken::Equal:

    // identifier '=' ... -> assignment statement

    Lex();

    return ParseAssignment(IDVal, true);

 

as the equal sign is the second token.  Is it possible to check this after
we check everything else?   This would allow me to check whether or not the
= sign represents an instruction or not before classifying as an assignment?
I was thinking that since we are mainly trying to match instructions, the
input parsing may be faster if we didn't try to identify everything as a
directive first?

 

Regards,

David 

 

 

 

The table is sorted by mnemonic (more abstractly, by operator). That's
pretty fundamental to how it works, so sticking with that would be good
unless you want to write an entirely new algorithm. You could probably stick
with the current basic stuff with some fiddling in tablegen where the asm
string gets split apart when building up matchables to re-order things
appropriately. Then your ParseInstruction() implementation would do similar
tricks. The printer should "just work," thankfully.

 

That said, you'll also likely have to do a bit of work in the generic
AsmParser code, as it'll likely look at statements like these and not
realize they're instruction sequences. The "mnemonic <whitespace>
operands"
format is pretty strongly imprinted on everything. That's not completely
unfixable, of course, but it may be a bit tricky to avoid syntactic
ambiguities. Not impossible, mind, just tricky and something to pay very
close attention to in your design.

 

-Jim

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121125/2b893675/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Oct 2012 - [LLVMdev] Hexagon Assembly parser question

[LLVMdev] Hexagon Assembly parser question

[LLVMdev] Hexagon Assembly parser question

[LLVMdev] Hexagon Assembly parser question

[LLVMdev] Hexagon Assembly parser question

Maybe Matching Threads