hi,
i think the current X86 disassembler is quite broken and fails badly on
handling REX for x86_64 code.
below are some examples:
$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
-triple=x86_64
.text
por %mm3, %mm0
$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc
-disassemble
-triple=x86_64
.text
por %mm3, %mm0
$ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc
-disassemble
-triple=x86_64
.text
<stdin>:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^
the last example should also return "por %mm3, %mm0", but it fails to
understand the input.
the reason stays with this line in X86DisassemblerDecoder.cpp:
rm |= bFromREX(insn->rexPrefix) << 3;
we can see that we take into account REX.B, but for "por" (0F EB),
this
should be ignored.
there are quite a lot of other instructions taking into account REX like
this, while according to the manual, REX should be ignored.
i dont see any clean solution for this issue without some significant
changes into the way we decode ModRM & providing more information to .td
files.
any idea?
thanks.
Jun
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141224/2206cb49/attachment.html>
Craig Topper
2014-Dec-24 06:43 UTC
[LLVMdev] X86 disassembler is quite broken on handling REX
I believe this particular error is caused by this. That seems easy enough
to just drop the bit. Do you have other non-mmx examples?
case TYPE_MM: \
if (index > 7) \
*valid = 0; \
return prefix##_MM0 + index;
On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <junkoi2004 at gmail.com>
wrote:>
> hi,
>
> i think the current X86 disassembler is quite broken and fails badly on
> handling REX for x86_64 code.
>
> below are some examples:
>
> $ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc
-disassemble
> -triple=x86_64
> .text
> por %mm3, %mm0
>
> $ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc
-disassemble
> -triple=x86_64
> .text
> por %mm3, %mm0
>
> $ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc
-disassemble
> -triple=x86_64
> .text
> <stdin>:1:1: warning: invalid instruction encoding
> 0x41,0x0f,0xeb,0xc3
> ^
>
>
> the last example should also return "por %mm3, %mm0", but it
fails to
> understand the input.
>
> the reason stays with this line in X86DisassemblerDecoder.cpp:
>
> rm |= bFromREX(insn->rexPrefix) << 3;
>
> we can see that we take into account REX.B, but for "por" (0F
EB), this
> should be ignored.
>
> there are quite a lot of other instructions taking into account REX like
> this, while according to the manual, REX should be ignored.
>
> i dont see any clean solution for this issue without some significant
> changes into the way we decode ModRM & providing more information to
.td
> files.
>
> any idea?
>
> thanks.
> Jun
>
--
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141223/68413a4b/attachment.html>
On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <craig.topper at gmail.com> wrote:> I believe this particular error is caused by this. That seems easy enough > to just drop the bit. Do you have other non-mmx examples? > > case TYPE_MM: \ > if (index > 7) \ > *valid = 0; \ > return prefix##_MM0 + index; >yes, exactly this place. but the question is: how do we know when to drop the REX.B? i dont know any non-MMX examples. it seems only MMX related instructions have this issue. thanks, Jun> > On Tue, Dec 23, 2014 at 10:17 PM, Jun Koi <junkoi2004 at gmail.com> wrote: >> >> hi, >> >> i think the current X86 disassembler is quite broken and fails badly on >> handling REX for x86_64 code. >> >> below are some examples: >> >> $ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble >> -triple=x86_64 >> .text >> por %mm3, %mm0 >> >> $ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble >> -triple=x86_64 >> .text >> por %mm3, %mm0 >> >> $ echo "0x41,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble >> -triple=x86_64 >> .text >> <stdin>:1:1: warning: invalid instruction encoding >> 0x41,0x0f,0xeb,0xc3 >> ^ >> >> >> the last example should also return "por %mm3, %mm0", but it fails to >> understand the input. >> >> the reason stays with this line in X86DisassemblerDecoder.cpp: >> >> rm |= bFromREX(insn->rexPrefix) << 3; >> >> we can see that we take into account REX.B, but for "por" (0F EB), this >> should be ignored. >> >> there are quite a lot of other instructions taking into account REX like >> this, while according to the manual, REX should be ignored. >> >> i dont see any clean solution for this issue without some significant >> changes into the way we decode ModRM & providing more information to .td >> files. >> >> any idea? >> >> thanks. >> Jun >> > > > -- > ~Craig >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141224/e144180f/attachment.html>