thr3ads.net - llvm dev - [LLVMdev] What does MCOperand model? [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Greg Fitzgerald

2012-Sep-26 18:26 UTC

[LLVMdev] What does MCOperand model?

A question for LLVM code generator developers:

After having read through "The LLVM Target-Independent Code Generator"
[1] I'm unclear about what precisely the objects MCInst and MCOperand
represent.  They sit in the space between assembly syntax and binary
encodings, but which are they modeling?  For example, a Thumb 2 branch
instruction 'b' takes an immediate.  That syntax "b #1234" can
map to
a couple different encodings.  If it is an even number between -2048
and 2046, it can be encoded with a 16-bit instruction, otherwise a
32-bit instruction.  If the MC objects are to model the syntax, then
one would expect both encodings to have identical values in the
MCOperand, a 32-bit signed integer.  On the other hand, if MC objects
are to model the encoding, one would expect the MCOperand for the
16-bit encoding to contain a number between -1024 and 1023.  Which one
is it?

My intuition says the MCOperand should model the assembly syntax and
contain the 32-bit signed integer, and that the EncoderMethod and
DecoderMethod are responsible for mapping that high-level number to
the low-level binary representation.  If, however, the MCOperand
models the encoding, then EncoderMethod and DecoderMethod glue need
not exist, and that bit-twiddling logic would be pushed to whoever
creates the MCOperand.

Looking at the Thumb backend, I believe it has been written assuming
the MC objects model the syntax, not the encoding, which matches my
intuition.  There has been some discussion on the llvm-commits list
encouraging us to store the encoded value in the MCOperand.  The
justification, as I understand it, is that the MCOperand should not
contain values that cannot be encoded.  This effectively means that
the MCOperands would be modeling the binary encoding, not the syntax.
Are folks making this transition in other backends as well?

[1] http://llvm.org/docs/CodeGenerator.html

Thanks,
Greg

Jim Grosbach

2012-Sep-26 21:02 UTC

head link

[LLVMdev] What does MCOperand model?

Owen is correct in his descriptions. The MCOperand values are intended to model
the instruction encoding. Where that doesn't match the assembly syntax, the
asm parser (and codegen) and the instruction printer are responsible for
encoding/decoding the values.

For targets that predate the MC layer, this isn't always the case, leading
to things being a bit confusing when just reading the code. Any new targets
should absolutely consider the instruction encoding to be the canonical
representation and map assembly syntax onto that, not the other way around.

Regards,
-Jim
On Sep 26, 2012, at 11:26 AM, Greg Fitzgerald <garious at gmail.com>
wrote:
> A question for LLVM code generator developers:
> 
> After having read through "The LLVM Target-Independent Code
Generator"
> [1] I'm unclear about what precisely the objects MCInst and MCOperand
> represent.  They sit in the space between assembly syntax and binary
> encodings, but which are they modeling?  For example, a Thumb 2 branch
> instruction 'b' takes an immediate.  That syntax "b
#1234" can map to
> a couple different encodings.  If it is an even number between -2048
> and 2046, it can be encoded with a 16-bit instruction, otherwise a
> 32-bit instruction.  If the MC objects are to model the syntax, then
> one would expect both encodings to have identical values in the
> MCOperand, a 32-bit signed integer.  On the other hand, if MC objects
> are to model the encoding, one would expect the MCOperand for the
> 16-bit encoding to contain a number between -1024 and 1023.  Which one
> is it?
> 
> My intuition says the MCOperand should model the assembly syntax and
> contain the 32-bit signed integer, and that the EncoderMethod and
> DecoderMethod are responsible for mapping that high-level number to
> the low-level binary representation.  If, however, the MCOperand
> models the encoding, then EncoderMethod and DecoderMethod glue need
> not exist, and that bit-twiddling logic would be pushed to whoever
> creates the MCOperand.
> 
> Looking at the Thumb backend, I believe it has been written assuming
> the MC objects model the syntax, not the encoding, which matches my
> intuition.  There has been some discussion on the llvm-commits list
> encouraging us to store the encoded value in the MCOperand.  The
> justification, as I understand it, is that the MCOperand should not
> contain values that cannot be encoded.  This effectively means that
> the MCOperands would be modeling the binary encoding, not the syntax.
> Are folks making this transition in other backends as well?
> 
> [1] http://llvm.org/docs/CodeGenerator.html
> 
> Thanks,
> Greg
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Greg Fitzgerald

2012-Sep-27 01:36 UTC

head link

[LLVMdev] What does MCOperand model?

> the MCOperand should not contain values that cannot be encoded
In the case of pre-encoding a shifted immediate, we acknowledge that
we've only moved the invalid encodings from ones that set the bottom
bit to ones that set the top?

Is there a backend that is implemented in this style I can use as a reference?

> The MCOperand values are intended to model the instruction
> encoding. Where that doesn't match the assembly syntax,
> the asm parser (and codegen) and the instruction printer are
> responsible for encoding/decoding the values.
As my colleague and I try to implement instructions in the recommended
style, we are finding it to be harder with the constraint of MCOperand
needing to be pre-encoded.  I've attached a diagram of my
understanding of the code flow if using the recommended style versus
using the MCOperand to model the syntax.  How far off am I?

[See attached]

In the diagram with pre-encoding, a shared function EncodeImm() has to
be referenced from 3 locations.  As a newcomer to LLVM going after a
simple encoding bug, I wasn't expecting to have to grok every client
of MCOperand just to fix how it is encoded.  To pre-encode, it seems
the .td file needs to use a custom operand that inherits from a
generic one for the only purpose of routing to the shared encoding
function.  Is there better alternative for getting from the LLVM
target-independent IR to the pre-encoded MCOperand?

Thanks,
Greg


On Wed, Sep 26, 2012 at 2:02 PM, Jim Grosbach <grosbach at apple.com>
wrote:> Owen is correct in his descriptions. The MCOperand values are intended to
model the instruction encoding. Where that doesn't match the assembly
syntax, the asm parser (and codegen) and the instruction printer are responsible
for encoding/decoding the values.
>
> For targets that predate the MC layer, this isn't always the case,
leading to things being a bit confusing when just reading the code. Any new
targets should absolutely consider the instruction encoding to be the canonical
representation and map assembly syntax onto that, not the other way around.
>
> Regards,
> -Jim
> On Sep 26, 2012, at 11:26 AM, Greg Fitzgerald <garious at gmail.com>
wrote:
>
>> A question for LLVM code generator developers:
>>
>> After having read through "The LLVM Target-Independent Code
Generator"
>> [1] I'm unclear about what precisely the objects MCInst and
MCOperand
>> represent.  They sit in the space between assembly syntax and binary
>> encodings, but which are they modeling?  For example, a Thumb 2 branch
>> instruction 'b' takes an immediate.  That syntax "b
#1234" can map to
>> a couple different encodings.  If it is an even number between -2048
>> and 2046, it can be encoded with a 16-bit instruction, otherwise a
>> 32-bit instruction.  If the MC objects are to model the syntax, then
>> one would expect both encodings to have identical values in the
>> MCOperand, a 32-bit signed integer.  On the other hand, if MC objects
>> are to model the encoding, one would expect the MCOperand for the
>> 16-bit encoding to contain a number between -1024 and 1023.  Which one
>> is it?
>>
>> My intuition says the MCOperand should model the assembly syntax and
>> contain the 32-bit signed integer, and that the EncoderMethod and
>> DecoderMethod are responsible for mapping that high-level number to
>> the low-level binary representation.  If, however, the MCOperand
>> models the encoding, then EncoderMethod and DecoderMethod glue need
>> not exist, and that bit-twiddling logic would be pushed to whoever
>> creates the MCOperand.
>>
>> Looking at the Thumb backend, I believe it has been written assuming
>> the MC objects model the syntax, not the encoding, which matches my
>> intuition.  There has been some discussion on the llvm-commits list
>> encouraging us to store the encoded value in the MCOperand.  The
>> justification, as I understand it, is that the MCOperand should not
>> contain values that cannot be encoded.  This effectively means that
>> the MCOperands would be modeling the binary encoding, not the syntax.
>> Are folks making this transition in other backends as well?
>>
>> [1] http://llvm.org/docs/CodeGenerator.html
>>
>> Thanks,
>> Greg
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120926/21292fa6/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Sep 2012 - [LLVMdev] What does MCOperand model?

[LLVMdev] What does MCOperand model?

[LLVMdev] What does MCOperand model?

[LLVMdev] What does MCOperand model?

Maybe Matching Threads