thr3ads.net - llvm dev - [LLVMdev] X86 Disassembler [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Sean Callanan

2009-Aug-19 23:39 UTC

[LLVMdev] X86 Disassembler

Bill,

thanks for your comments.  I'll respond to them individually.  I've  
attached a new revision of the patch that addresses them.  Patch built  
and tested against SVN 79487, with the additional attached fix that  
fixes an Intel table bug.

Sean

On 2009/08/18, at 0:57, Bill Wendling wrote:> 0. Watch out for tabs!
Fixed.  Thanks.
> 1. Includes like this "#include <llvm/MC/MCInst.h>" should
be in
> quotes, not angle brackets.
Fixed.
> 2. In the "class Indenter", the "push" and
"pop" methods check for
> over/under runs before proceeding. Should these be "asserts"
instead?
Yeah, I made them asserts as you suggested.  I was thinking about  
maybe making them saturate, but honestly if we run out of space we  
should simply break and allow the developer to make the Indenter  
bigger.  Maybe we should have another argument to the constructor  
specifying how large the buffer should be...?
> 3. In include/llvm/Support/MemoryObject.h, you changed all  
> "uintptr_t" to "uint64_t". Is there a good reason for
this? If not,
> please change them back.'
Yes, there is.  If you're disassembling in a memory space that's  
larger than yours (for example, on a computer where LLVM is built 32- 
bit but the source of the bytes is a 64-bit address space), then I  
don't want to be limited by the local process's notion of uintptr_t.
> 4. If you have a class that's completely contained within a .cpp  
> file, go ahead and define it in an anonymous namespace and mark it  
> "VISIBILITY_HIDDEN". Like this:
>
> namespace {
> class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject {
> ...
I used VISIBILITY_HIDDEN for both StringMemoryObject and  
X86DisassemblerEmitter::X86DEBackend.
I wrapped StringMemoryObject in an anonymous namespace.
I didn't wrap X86DEBackend because it's already nested.
> 5. In "readBytes" in StringMemoryObject, you're returning a
"-1",
> but the method's return type is "uint64_t". Do you *really*
want to
> return 0xfffffffffffU, or is "-1" a sentinel value of some sort?
If
> the latter, perhaps changing the return type to "int64_t"
instead.
> If the former, then changing the return to "-1U" to make it
explicit.
-1 is a sentinel value.  I changed readBytes to return an int and  
accept a uint64_t* that it fills in if it's non-NULL.
> 6. The use of "#include <iostream>" is strictly forbidden.
Please
> remove it wherever you find it (HexDisassembler.h for one).
Yeah, I guess those static initializers aren't so great.  Removed.
> 7. Look for opportunities to use the LLVM small container classes.  
> For instance, the "fOperands" ivar in RecognizableInsn.
I looked at my uses of std::vector but they wouldn't really benefit  
from the good performance of SmallVector for small values of N.

- fInstructions in X86Disassembler.h is not a good candidate for this,  
because the number of instructions could potentially be quite large.
- numberedInstructions in X86DisassemblerEmitter.cpp is not a good  
candidate because there are upwards of a thousand instructions in the  
x86 table.
- fInstructionSpecifiers in X86DisassemblerTables.h is not a good  
candidate either... the number is (again) very large.
- fOperands is not a good candidate because is never actually  
created.  It is always set to point to CodeGenInstruction::OperandList.
> 8. While we're on that, please use "Instruction" or
"Instr" instead
> of "Insn" for consistency's sake.
Fixed.  Thanks for the heads-up.
> 9. You have types called "operandType_t", etc. We normally
don't use
> the "_t" suffix for types. Perhaps renaming them.
Fixed.  Now for example operandType_t is OperandType.
> 10. In the "static inline bool outranks(instructionContext_t upper,  
> instructionContext_t lower)", it would be good to assert that upper  
> and lower are strictly less than IC_max.
Asserted.
> 11. In several places, you put 'default: assert(0 &&
"Badness
> happens here!");" at the end of a switch statement. When
assertions
> are turned off, I believe that this could emit warnings. It would be  
> better to put it as the first case of the "switch". And maybe use
> the new "llvm_unreachable()" macro as well.
All right, I moved them and used llvm_unreachable().  In the C code,  
where I can't safely #include "llvm/Support/ErrorHandling.h," I  
declared a macro that does the same thing.
> 12. You use "std::endl" a lot. This is discouraged by the coding
> standard:
>
> 	http://llvm.org/docs/CodingStandards.html#ll_avoidendl
Purged.
> 13. You use "std::ostream" a lot. Would it be appropriate to use
> "raw_ostream" instead?
To make sure that the table definition file looks pretty, I use  
<iomanip>, and none of those functions work with raw_ostreams.  I  
actually only create three std::ostringstreams – during table  
generation – so for now I'm going to leave that the way it is.
> 14. In DisassemblerTables::setTableFields, you emit things to  
> "std::cerr". Use "errs()" instead. Or, better yet,  
> "llvm_report_error". :-)
Fixed to use errs().
> 15. Prefer pre-increment/decrement for variables:
>
> 	http://llvm.org/docs/CodingStandards.html#micro_preincrement
Learned a new one there.  Thanks!
> 16. In X86RecognizableInsn.cpp, you use the large macro:  
> HANDLE_OPERAND. Could you make it an inlined function instead? It  
> could take a pointer to a member function for the method it wants to  
> call. Debugging large macros is impossible. :)
I factored the code out into a function.  There are still macros to  
keep things pretty, but they just wrap the function.
> 17. I'm confused by the C files in lib/Target/X86. Why aren't they
C+
> + files?
The reason is that I am aware of clients that need disassembly but  
can't link in C++ code, so for those clients I've separated out a C  
core.  They won't get the benefits of the MCInst, but that's their  
problem :)
> Okay. That's all I found for now. :-) I haven't looked at the  
> algorithm too much, though it seems fairly straight-forward. I'm  
> sure that since you're continually testing it, that it's of high  
> quality. When you do get around to submission, we will want all of  
> the tests you've been running to be added to the test/ directory.
>
> Thanks!
> -bw
Sean

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm.diff
Type: application/octet-stream
Size: 232883 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pcmpestrm.diff
Type: application/octet-stream
Size: 779 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment-0002.html>

Chris Lattner

2009-Aug-22 23:31 UTC

head link

[LLVMdev] X86 Disassembler

On Aug 19, 2009, at 4:39 PM, Sean Callanan wrote:> thanks for your comments.  I'll respond to them individually.  I've
> attached a new revision of the patch that addresses them.  Patch  
> built and tested against SVN 79487, with the additional attached fix  
> that fixes an Intel table bug.
Thanks Sean, comments below.  Are you sure you attached the updated  
patch?  I still see a lot of std::endl's etc.
>> 3. In include/llvm/Support/MemoryObject.h, you changed all  
>> "uintptr_t" to "uint64_t". Is there a good reason
for this? If not,
>> please change them back.'
>
> Yes, there is.  If you're disassembling in a memory space that's  
> larger than yours (for example, on a computer where LLVM is built 32- 
> bit but the source of the bytes is a 64-bit address space), then I  
> don't want to be limited by the local process's notion of
uintptr_t.
Makes perfect sense, please add a comment in the MemoryObject doxygen  
comment that explains this.  You can split this change out of your  
megapatch and commit it separately.
>> 13. You use "std::ostream" a lot. Would it be appropriate to
use
>> "raw_ostream" instead?
>
> To make sure that the table definition file looks pretty, I use  
> <iomanip>, and none of those functions work with raw_ostreams.  I  
> actually only create three std::ostringstreams – during table  
> generation – so for now I'm going to leave that the way it is.
I actually find iomanip to be pretty ugly for simple things.  llvm/ 
Support/Format.h lets you do things like:

OS << "mynumber: " << format("%4.5f", 1234.412)
<< '\n';

with raw_ostreams.

Using "printf style" format strings is a lot more pretty than:

+#define BYTE_FLAGS std::hex << std::setw(2) <<
std::setfill('0') <<
std::right

:)

Not a critical issue, but please put a space after control flow  
keywords like if/while.
      while(current - address < size && current < limit) {
        if(readByte(current, &buf[(current - address)]))

Please stick to plain ascii letters in source files:
+ * The raison d'être for this header file is to provide those  
definitions that

One meta comment: it is generally preferable to split out patches into  
small pieces.  Please propose the APIs without the X86 implementation  
for the next round, it should be much easier to get that piece in than  
doing the whole patch at once.

Some micro level coding stuff:

+#include <llvm/MC/MCInst.h>
+#include <llvm/Support/MemoryObject.h>
+#include <llvm/Support/raw_ostream.h>

Please use "" instead of <>.  You can also just forward declare
all
these classes instead of #including their headers.

+#include "assert.h"

Please use the c++ version of c headers when possible: <cassert>.

+  MCDisassembler(MemoryObject& region,
+                 raw_ostream& vStream) {

Please use "foo_t &f" instead of "foo_t& f".

+  virtual ~MCDisassembler() {
+  }

Make sure each class with a virtual method has at least one method  
defined out of line:
http://llvm.org/docs/CodingStandards.html#ll_virtual_anch

Some bigger stuff:

include/llvm/MC/MCDisassembler.h seems to have four copies of itself  
in the same file.

In terms of API design:

+  /// Constructor     - Performs initial setup for the disassembler.   
This may
+  ///                   do the work of disassembly, or may  
(especially on
+  ///                   fixed-instruction-length CPUs) set things up  
for lazy
+  ///                   disassembly.
+  ///
+  /// @param region   - The memory object to use as a source for  
machine code.
+  /// @param vStream  - The stream to print warnings and diagnostic  
messages on.
+  MCDisassembler(MemoryObject& region,
+                 raw_ostream& vStream) {
+  }

Forcing diagnostics to get rendered to a stream is somewhat  
unfortunate.  Different clients may want different information.  Would  
it be possible to use error codes, or make each action (like  
getInstruction) fill in an error code struct with possible failure info?

The MCDisassembler API is also somewhat strange to me.  Right now it  
is designed to analyze a memory region when constructed and contain  
them.  It seems that there is a more useful low level API which would  
look something like this:

class MCDisassembler {
public:

   virtual ~MCDisassembler();

   /// DisassembleInstruction - Disassemble the first instruction in  
the specified region, printing the disassembled instruction to the  
specified raw_ostream, and returning the size of the instruction in  
bytes.  On error, this returns zero and fills in ErrorInfo with a  
human readable description of the error.
   virtual unsigned DisassembleInstruction(MemoryObject &region,  
raw_ostream &OS, std::string &ErrorInfo) = 0;

}

Having this sort of stateless API means that higher level clients  
(which are stateful) can be built on top of them without adding  
overhead to clients who don't care about the state.

+++ include/llvm/Support/Indenter.h	(revision 0)
@@ -0,0 +1,72 @@
+//===- Indenter.h - Output indenter -----------------------------*- C+ 
+ -*-===//

I'm not a big fan of manipulators like this (as David Greene can tell  
you ;-).  I just added a new raw_ostream::indent method, so now  
instead of stuff like this:

+void DisassemblerTables::emitContextDecision(
+  std::ostream& o2,
+  Indenter& i2,
...

+  o2 << i2.indent() << "struct ContextDecision " <<
name << " = {" <<
std::endl;
+  i2.push();
+  o2 << i2.indent() << "{ /* opcodeDecisions */" <<
std::endl;
+  i2.push();

You can use:

+void DisassemblerTables::emitContextDecision(
+  raw_ostream &o2,
+  unsigned &Indent2,

+  o2.indent(Indent2++) << "struct ContextDecision " <<
name << " =
{\n";
+  o2.indent(Indent2++) << "{ /* opcodeDecisions */\n";

Which reads better, is more efficient, and scales to more than 256  
levels of indentation.

+#include <fcntl.h>      // for fcntl, open
+#include <stdio.h>      // for perror
+#include <sys/types.h>  // for read
+#include <sys/uio.h>    // for read
+#include <unistd.h>     // for read

unistd etc won't work on windows.  Please use the llvm/Support/ 
MemoryBuffer.h API for reading files.  The LineReader is easier to  
implement with MemoryBuffer because you just scan for \n and \r in the  
memory range.

+std::string stringFromMCInst_x86(const MCInst& insn,
+                                 const Target* target,
+                                 std::string& triple) {

Please mark this static so that you can shrink the anon namespace.

+  X86ATTAsmPrinter* asmPrinter +   
dynamic_cast<X86ATTAsmPrinter*>(functionPass);

Please don't use dynamic_cast: we're trying to eliminate RTTI a C cast  
should be fine.

+int HexDisassembler::disassemble(const Target* target,
+                                 LineReader& reader) {
...

+
+    if(targetName == "x86") {
+      disasm = new X86Disassembler::X86_32Disassembler(obj, logstream);
+      triple = "x86-apple-darwin";
+    }
+    else if(targetName == "x86-64") {
+      disasm = new X86Disassembler::X86_64Disassembler(obj, logstream);
+      triple = "x86_64-apple-darwin";
+    }
+    else {

Instead of doing something like this, the X86 disassembler should  
register itself as part of the "Target" API the same way that  
asmprinters, MCAsmInfo, targets, etc do.  The X86 specific  
disassembler should go in lib/Target/X86/Disassembler.  Take a look at  
how the Target/X86/AsmParser stuff registers itself for a good example.

--- tools/llvm-mc/HexDisassembler.h	(revision 0)
+++ tools/llvm-mc/HexDisassembler.h	(revision 0)

+#include <fstream>
+#include <string>

the fstream #include is not needed.  It might make sense to change the  
disassemble APIs to take a MemoryBuffer, and have the llvm-mc.cpp  
driver just handle all the file reading stuff.

I haven't reviewed the X86 specific changes yet, it would be good to  
get the high level design sorted out first, I'll review it in the next  
patch.  Please commit the MemoryObject.h changes (int64_t + comments)  
whenever you feel like it, and propose the basic API changes without  
the X86 specific part next.

Thanks for working on this Sean, it's great work!

-Chris

Bill Wendling

2009-Aug-25 08:41 UTC

head link

[LLVMdev] X86 Disassembler

On Aug 19, 2009, at 4:39 PM, Sean Callanan wrote:
> Bill,
>
> thanks for your comments.  I'll respond to them individually.  I've
> attached a new revision of the patch that addresses them.  Patch  
> built and tested against SVN 79487, with the additional attached fix  
> that fixes an Intel table bug.
>Hi Sean,

Thanks for the fixes!
>> 1. Includes like this "#include <llvm/MC/MCInst.h>"
should be in
>> quotes, not angle brackets.
>
> Fixed.
>I still saw a few more sneaking in there. Also, don't put system  
headers (like assert.h, cstdio, etc.) in quotes. I just meant the LLVM  
headers. :-)
>> 2. In the "class Indenter", the "push" and
"pop" methods check for
>> over/under runs before proceeding. Should these be "asserts"
instead?
>
> Yeah, I made them asserts as you suggested.  I was thinking about  
> maybe making them saturate, but honestly if we run out of space we  
> should simply break and allow the developer to make the Indenter  
> bigger.  Maybe we should have another argument to the constructor  
> specifying how large the buffer should be...?
>Either that or make it a SmallVector with initial size "256". I  
*think* that all of the elements in that vector are guaranteed to be  
contiguous. (Might want to check.) But that way you won't have to  
worry about the array growing too big and needing more space.
>> 3. In include/llvm/Support/MemoryObject.h, you changed all  
>> "uintptr_t" to "uint64_t". Is there a good reason
for this? If not,
>> please change them back.'
>
> Yes, there is.  If you're disassembling in a memory space that's  
> larger than yours (for example, on a computer where LLVM is built 32- 
> bit but the source of the bytes is a 64-bit address space), then I  
> don't want to be limited by the local process's notion of
uintptr_t.
>I see. As Chris said, that makes sense. :) I was just worried that  
bits would fall on the floor if you convert a pointer value to a fixed  
bit-width size. Of course, when we go to 128-bit processors, we'll  
have to change this. :-D
>> 4. If you have a class that's completely contained within a .cpp  
>> file, go ahead and define it in an anonymous namespace and mark it  
>> "VISIBILITY_HIDDEN". Like this:
>>
>> namespace {
>> class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject {
>> ...
>
> I used VISIBILITY_HIDDEN for both StringMemoryObject and  
> X86DisassemblerEmitter::X86DEBackend.
> I wrapped StringMemoryObject in an anonymous namespace.
> I didn't wrap X86DEBackend because it's already nested.
>Thanks.
>> 5. In "readBytes" in StringMemoryObject, you're returning
a "-1",
>> but the method's return type is "uint64_t". Do you
*really* want to
>> return 0xfffffffffffU, or is "-1" a sentinel value of some
sort? If
>> the latter, perhaps changing the return type to "int64_t"
instead.
>> If the former, then changing the return to "-1U" to make it
explicit.
>
> -1 is a sentinel value.  I changed readBytes to return an int and  
> accept a uint64_t* that it fills in if it's non-NULL.
>Thanks. I'm not a huge fan of sentinel values if they can be avoided.
>> 7. Look for opportunities to use the LLVM small container classes.  
>> For instance, the "fOperands" ivar in RecognizableInsn.
>
> I looked at my uses of std::vector but they wouldn't really benefit  
> from the good performance of SmallVector for small values of N.
>
> - fInstructions in X86Disassembler.h is not a good candidate for  
> this, because the number of instructions could potentially be quite  
> large.
> - numberedInstructions in X86DisassemblerEmitter.cpp is not a good  
> candidate because there are upwards of a thousand instructions in  
> the x86 table.
> - fInstructionSpecifiers in X86DisassemblerTables.h is not a good  
> candidate either... the number is (again) very large.
> - fOperands is not a good candidate because is never actually  
> created.  It is always set to point to  
> CodeGenInstruction::OperandList.
>Thanks for checking!
>> 11. In several places, you put 'default: assert(0 &&
"Badness
>> happens here!");" at the end of a switch statement. When
assertions
>> are turned off, I believe that this could emit warnings. It would  
>> be better to put it as the first case of the "switch". And
maybe
>> use the new "llvm_unreachable()" macro as well.
>
> All right, I moved them and used llvm_unreachable().  In the C code,  
> where I can't safely #include "llvm/Support/ErrorHandling.h,"
I
> declared a macro that does the same thing.
>Great!
>> 15. Prefer pre-increment/decrement for variables:
>>
>> 	http://llvm.org/docs/CodingStandards.html#micro_preincrement
>
> Learned a new one there.  Thanks!
>No prob. :-)
>> 16. In X86RecognizableInsn.cpp, you use the large macro:  
>> HANDLE_OPERAND. Could you make it an inlined function instead? It  
>> could take a pointer to a member function for the method it wants  
>> to call. Debugging large macros is impossible. :)
>
> I factored the code out into a function.  There are still macros to  
> keep things pretty, but they just wrap the function.
>Thanks!
>> 17. I'm confused by the C files in lib/Target/X86. Why aren't
they C
>> ++ files?
>
> The reason is that I am aware of clients that need disassembly but  
> can't link in C++ code, so for those clients I've separated out a C
> core.  They won't get the benefits of the MCInst, but that's their
> problem :)
>Ah! Okay...I wonder if it would be good to place them in the llvm-c  
bindings somehow? I honestly don't know much about those bindings, so  
this might not be feasible. But it would be good to keep icky C code  
out of pristine C++ code (tongue *firmly* planted in cheek).

-bw

Sean Callanan

2009-Sep-04 00:25 UTC

head link

[LLVMdev] X86 Disassembler

I was away doing other things for a while, but I have an API patch  
separated out, which (in addition to being much smaller than past  
megapatches) corrects two issues Chris identified in his most recent  
set of patches:

- First, it makes the API a good deal simpler.  Now, you can  
instantiate a single MCDisassembler and, each time you want an  
instruction disassembled, you can simply pass it a MemoryRegion, an  
offset, and an MCInst to populate.

- Second, it adds MCDisassembler to the list of things you can get  
from a TargetMachine, so you don't have to #include something from lib/ 
Target/X86 just to use the X86 disassembler.

Please let me know what you think of this API.  Thanks for your time.

Sean



I have included Chris's previous comments below.

On Aug 22, 2009, at 4:31 PM, Chris Lattner wrote:> The MCDisassembler API is also somewhat strange to me.  Right now it  
> is designed to analyze a memory region when constructed and contain  
> them.  It seems that there is a more useful low level API which  
> would look something like this:
>
> class MCDisassembler {
> public:
>
>  virtual ~MCDisassembler();
>
>  /// DisassembleInstruction - Disassemble the first instruction in  
> the specified region, printing the disassembled instruction to the  
> specified raw_ostream, and returning the size of the instruction in  
> bytes.  On error, this returns zero and fills in ErrorInfo with a  
> human readable description of the error.
>  virtual unsigned DisassembleInstruction(MemoryObject &region,  
> raw_ostream &OS, std::string &ErrorInfo) = 0;
>
> }
>
> Having this sort of stateless API means that higher level clients  
> (which are stateful) can be built on top of them without adding  
> overhead to clients who don't care about the state.
> Instead of doing something like this, the X86 disassembler should  
> register itself as part of the "Target" API the same way that  
> asmprinters, MCAsmInfo, targets, etc do.  The X86 specific  
> disassembler should go in lib/Target/X86/Disassembler.  Take a look  
> at how the Target/X86/AsmParser stuff registers itself for a good  
> example.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090903/16292849/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-disassembler-api.diff
Type: application/octet-stream
Size: 10632 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090903/16292849/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090903/16292849/attachment-0001.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Sep 2009 - [LLVMdev] X86 Disassembler

[LLVMdev] X86 Disassembler

[LLVMdev] X86 Disassembler

[LLVMdev] X86 Disassembler

[LLVMdev] X86 Disassembler

Reasonably Related Threads