thr3ads.net - llvm dev - [LLVMdev] X86 disassembler 0x66 prefix [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Olivier Meurant

2010-Dec-22 16:31 UTC

[LLVMdev] X86 disassembler 0x66 prefix

There is a problem on X86 disassembler for instructions beginning with x86
prefix :

$ echo "0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00" | llvm-mc
--disassemble
        movdqa  (%edi), %xmm1

$ echo "0x53 0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00" | llvm-mc
--disassemble
        pushl   %ebx
<stdin>:1:6: warning: invalid instruction encoding
0x53 0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00
     ^


I attach a patch, but I'm not really familiar with this code, so if someone
can review/correct, it would be great.

The problem is between readPrefixes and getId : prefixLocation depends on
buffer position, but necessaryPrefixLocation is never updated to match
buffer position.

Cheers,
Olivier.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101222/4e629a89/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x86_disassembler_66_prefix.patch
Type: text/x-patch
Size: 688 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101222/4e629a89/attachment.bin>

Sean Callanan

2010-Dec-23 02:17 UTC

head link

[LLVMdev] X86 disassembler 0x66 prefix

Olivier,

thanks for identifying that issue!  Your fix corrects the specific testcase you
provide, but it could produce incorrect results in other cases.

Here's a bit of background on necessary_prefix_location.

necessary_prefix_location is supposed to keep track of where prefixes like 66,
f3, and f2 – which affect the class of instruction being decoded fundamentally –
should be expected to be.  If you look at the "Instruction Format"
section of the Intel manuals (specifically, Volume 2A of the Intel 64 and IA-32
Architectures Software Developer's Manual), it specifies that

–
Some instructions may use F2H,F3H as a mandatory prefix to express distinct
functionality. A mandatory prefix generally should be placed after other
optional prefixes (exception to this is discussed in Section 2.2.1, “REX
Prefixes”).
–

Also,

–
Some SSE2/SSE3/SSSE3/SSE4 instructions and instructions using a three-byte
sequence of primary opcode bytes may use 66H as a mandatory prefix to express
distinct functionality. A mandatory prefix generally should be placed after
other optional prefixes (exception to this is discussed in Section 2.2.1, “REX
Prefixes”).
–

This is why the decoder maintains a special location – necessary_prefix_location
– at which f2, f3, or 66 must reside.  Perhaps I ought to call it
"mandatory_prefix_location"...

In the 64-bit case (under "REX Prefixes"), the manual states:

–
Only one REX prefix is allowed per instruction. If used,  the REX prefix byte
must immediately precede the opcode byte or the escape opcode byte (0FH). When a
REX prefix is used in conjunction with an instruction containing a mandatory
prefix,  the mandatory prefix must come before the REX so the REX prefix can be
immediately preceding the opcode or the escape byte.
–

I took a quick look in the sources and saw that necessaryPrefixLocation was
being set at two locations: line 382, and line 387.  Those are both only enabled
if insn->mode == MODE_64BIT, which in X86 (of course) it isn't.  They are
handling the specific 64-bit case; the 32-bit case is unhandled (as you
spotted).

Your patch would set necessaryPrefixLocation to wherever the 66 prefix is found,
which works most of the time but would break cases in which 66 is not used as a
mandatory prefix, but merely as an optional prefix (the operand-size prefix). 
Although obscure, such a use is possible.  I have attached a patch that solves
the problem more generally.

Sean

On Dec 22, 2010, at 8:31 AM, Olivier Meurant wrote:
> There is a problem on X86 disassembler for instructions beginning with x86
prefix :
> 
> $ echo "0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00" | llvm-mc
--disassemble
>         movdqa  (%edi), %xmm1
> 
> $ echo "0x53 0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00" | llvm-mc
--disassemble
>         pushl   %ebx
> <stdin>:1:6: warning: invalid instruction encoding
> 0x53 0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00
>      ^
> 
> 
> I attach a patch, but I'm not really familiar with this code, so if
someone can review/correct, it would be great.
> 
> The problem is between readPrefixes and getId : prefixLocation depends on
buffer position, but necessaryPrefixLocation is never updated to match buffer
position.
> 
> Cheers,
> Olivier.
> 
> 
>
<x86_disassembler_66_prefix.patch>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101222/692701b0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm.necessaary-prefix-location.diff
Type: application/octet-stream
Size: 445 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101222/692701b0/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101222/692701b0/attachment-0001.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Dec 2010 - [LLVMdev] X86 disassembler 0x66 prefix

[LLVMdev] X86 disassembler 0x66 prefix

[LLVMdev] X86 disassembler 0x66 prefix

Reasonably Related Threads