Bill, thanks for your comments. I'll respond to them individually. I've attached a new revision of the patch that addresses them. Patch built and tested against SVN 79487, with the additional attached fix that fixes an Intel table bug. Sean On 2009/08/18, at 0:57, Bill Wendling wrote:> 0. Watch out for tabs!Fixed. Thanks.> 1. Includes like this "#include <llvm/MC/MCInst.h>" should be in > quotes, not angle brackets.Fixed.> 2. In the "class Indenter", the "push" and "pop" methods check for > over/under runs before proceeding. Should these be "asserts" instead?Yeah, I made them asserts as you suggested. I was thinking about maybe making them saturate, but honestly if we run out of space we should simply break and allow the developer to make the Indenter bigger. Maybe we should have another argument to the constructor specifying how large the buffer should be...?> 3. In include/llvm/Support/MemoryObject.h, you changed all > "uintptr_t" to "uint64_t". Is there a good reason for this? If not, > please change them back.'Yes, there is. If you're disassembling in a memory space that's larger than yours (for example, on a computer where LLVM is built 32- bit but the source of the bytes is a 64-bit address space), then I don't want to be limited by the local process's notion of uintptr_t.> 4. If you have a class that's completely contained within a .cpp > file, go ahead and define it in an anonymous namespace and mark it > "VISIBILITY_HIDDEN". Like this: > > namespace { > class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject { > ...I used VISIBILITY_HIDDEN for both StringMemoryObject and X86DisassemblerEmitter::X86DEBackend. I wrapped StringMemoryObject in an anonymous namespace. I didn't wrap X86DEBackend because it's already nested.> 5. In "readBytes" in StringMemoryObject, you're returning a "-1", > but the method's return type is "uint64_t". Do you *really* want to > return 0xfffffffffffU, or is "-1" a sentinel value of some sort? If > the latter, perhaps changing the return type to "int64_t" instead. > If the former, then changing the return to "-1U" to make it explicit.-1 is a sentinel value. I changed readBytes to return an int and accept a uint64_t* that it fills in if it's non-NULL.> 6. The use of "#include <iostream>" is strictly forbidden. Please > remove it wherever you find it (HexDisassembler.h for one).Yeah, I guess those static initializers aren't so great. Removed.> 7. Look for opportunities to use the LLVM small container classes. > For instance, the "fOperands" ivar in RecognizableInsn.I looked at my uses of std::vector but they wouldn't really benefit from the good performance of SmallVector for small values of N. - fInstructions in X86Disassembler.h is not a good candidate for this, because the number of instructions could potentially be quite large. - numberedInstructions in X86DisassemblerEmitter.cpp is not a good candidate because there are upwards of a thousand instructions in the x86 table. - fInstructionSpecifiers in X86DisassemblerTables.h is not a good candidate either... the number is (again) very large. - fOperands is not a good candidate because is never actually created. It is always set to point to CodeGenInstruction::OperandList.> 8. While we're on that, please use "Instruction" or "Instr" instead > of "Insn" for consistency's sake.Fixed. Thanks for the heads-up.> 9. You have types called "operandType_t", etc. We normally don't use > the "_t" suffix for types. Perhaps renaming them.Fixed. Now for example operandType_t is OperandType.> 10. In the "static inline bool outranks(instructionContext_t upper, > instructionContext_t lower)", it would be good to assert that upper > and lower are strictly less than IC_max.Asserted.> 11. In several places, you put 'default: assert(0 && "Badness > happens here!");" at the end of a switch statement. When assertions > are turned off, I believe that this could emit warnings. It would be > better to put it as the first case of the "switch". And maybe use > the new "llvm_unreachable()" macro as well.All right, I moved them and used llvm_unreachable(). In the C code, where I can't safely #include "llvm/Support/ErrorHandling.h," I declared a macro that does the same thing.> 12. You use "std::endl" a lot. This is discouraged by the coding > standard: > > http://llvm.org/docs/CodingStandards.html#ll_avoidendlPurged.> 13. You use "std::ostream" a lot. Would it be appropriate to use > "raw_ostream" instead?To make sure that the table definition file looks pretty, I use <iomanip>, and none of those functions work with raw_ostreams. I actually only create three std::ostringstreams – during table generation – so for now I'm going to leave that the way it is.> 14. In DisassemblerTables::setTableFields, you emit things to > "std::cerr". Use "errs()" instead. Or, better yet, > "llvm_report_error". :-)Fixed to use errs().> 15. Prefer pre-increment/decrement for variables: > > http://llvm.org/docs/CodingStandards.html#micro_preincrementLearned a new one there. Thanks!> 16. In X86RecognizableInsn.cpp, you use the large macro: > HANDLE_OPERAND. Could you make it an inlined function instead? It > could take a pointer to a member function for the method it wants to > call. Debugging large macros is impossible. :)I factored the code out into a function. There are still macros to keep things pretty, but they just wrap the function.> 17. I'm confused by the C files in lib/Target/X86. Why aren't they C+ > + files?The reason is that I am aware of clients that need disassembly but can't link in C++ code, so for those clients I've separated out a C core. They won't get the benefits of the MCInst, but that's their problem :)> Okay. That's all I found for now. :-) I haven't looked at the > algorithm too much, though it seems fairly straight-forward. I'm > sure that since you're continually testing it, that it's of high > quality. When you do get around to submission, we will want all of > the tests you've been running to be added to the test/ directory. > > Thanks! > -bwSean -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm.diff Type: application/octet-stream Size: 232883 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: pcmpestrm.diff Type: application/octet-stream Size: 779 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment-0001.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090819/265a438b/attachment-0002.html>
On Aug 19, 2009, at 4:39 PM, Sean Callanan wrote:> thanks for your comments. I'll respond to them individually. I've > attached a new revision of the patch that addresses them. Patch > built and tested against SVN 79487, with the additional attached fix > that fixes an Intel table bug.Thanks Sean, comments below. Are you sure you attached the updated patch? I still see a lot of std::endl's etc.>> 3. In include/llvm/Support/MemoryObject.h, you changed all >> "uintptr_t" to "uint64_t". Is there a good reason for this? If not, >> please change them back.' > > Yes, there is. If you're disassembling in a memory space that's > larger than yours (for example, on a computer where LLVM is built 32- > bit but the source of the bytes is a 64-bit address space), then I > don't want to be limited by the local process's notion of uintptr_t.Makes perfect sense, please add a comment in the MemoryObject doxygen comment that explains this. You can split this change out of your megapatch and commit it separately.>> 13. You use "std::ostream" a lot. Would it be appropriate to use >> "raw_ostream" instead? > > To make sure that the table definition file looks pretty, I use > <iomanip>, and none of those functions work with raw_ostreams. I > actually only create three std::ostringstreams – during table > generation – so for now I'm going to leave that the way it is.I actually find iomanip to be pretty ugly for simple things. llvm/ Support/Format.h lets you do things like: OS << "mynumber: " << format("%4.5f", 1234.412) << '\n'; with raw_ostreams. Using "printf style" format strings is a lot more pretty than: +#define BYTE_FLAGS std::hex << std::setw(2) << std::setfill('0') << std::right :) Not a critical issue, but please put a space after control flow keywords like if/while. while(current - address < size && current < limit) { if(readByte(current, &buf[(current - address)])) Please stick to plain ascii letters in source files: + * The raison d'être for this header file is to provide those definitions that One meta comment: it is generally preferable to split out patches into small pieces. Please propose the APIs without the X86 implementation for the next round, it should be much easier to get that piece in than doing the whole patch at once. Some micro level coding stuff: +#include <llvm/MC/MCInst.h> +#include <llvm/Support/MemoryObject.h> +#include <llvm/Support/raw_ostream.h> Please use "" instead of <>. You can also just forward declare all these classes instead of #including their headers. +#include "assert.h" Please use the c++ version of c headers when possible: <cassert>. + MCDisassembler(MemoryObject& region, + raw_ostream& vStream) { Please use "foo_t &f" instead of "foo_t& f". + virtual ~MCDisassembler() { + } Make sure each class with a virtual method has at least one method defined out of line: http://llvm.org/docs/CodingStandards.html#ll_virtual_anch Some bigger stuff: include/llvm/MC/MCDisassembler.h seems to have four copies of itself in the same file. In terms of API design: + /// Constructor - Performs initial setup for the disassembler. This may + /// do the work of disassembly, or may (especially on + /// fixed-instruction-length CPUs) set things up for lazy + /// disassembly. + /// + /// @param region - The memory object to use as a source for machine code. + /// @param vStream - The stream to print warnings and diagnostic messages on. + MCDisassembler(MemoryObject& region, + raw_ostream& vStream) { + } Forcing diagnostics to get rendered to a stream is somewhat unfortunate. Different clients may want different information. Would it be possible to use error codes, or make each action (like getInstruction) fill in an error code struct with possible failure info? The MCDisassembler API is also somewhat strange to me. Right now it is designed to analyze a memory region when constructed and contain them. It seems that there is a more useful low level API which would look something like this: class MCDisassembler { public: virtual ~MCDisassembler(); /// DisassembleInstruction - Disassemble the first instruction in the specified region, printing the disassembled instruction to the specified raw_ostream, and returning the size of the instruction in bytes. On error, this returns zero and fills in ErrorInfo with a human readable description of the error. virtual unsigned DisassembleInstruction(MemoryObject ®ion, raw_ostream &OS, std::string &ErrorInfo) = 0; } Having this sort of stateless API means that higher level clients (which are stateful) can be built on top of them without adding overhead to clients who don't care about the state. +++ include/llvm/Support/Indenter.h (revision 0) @@ -0,0 +1,72 @@ +//===- Indenter.h - Output indenter -----------------------------*- C+ + -*-===// I'm not a big fan of manipulators like this (as David Greene can tell you ;-). I just added a new raw_ostream::indent method, so now instead of stuff like this: +void DisassemblerTables::emitContextDecision( + std::ostream& o2, + Indenter& i2, ... + o2 << i2.indent() << "struct ContextDecision " << name << " = {" << std::endl; + i2.push(); + o2 << i2.indent() << "{ /* opcodeDecisions */" << std::endl; + i2.push(); You can use: +void DisassemblerTables::emitContextDecision( + raw_ostream &o2, + unsigned &Indent2, + o2.indent(Indent2++) << "struct ContextDecision " << name << " = {\n"; + o2.indent(Indent2++) << "{ /* opcodeDecisions */\n"; Which reads better, is more efficient, and scales to more than 256 levels of indentation. +#include <fcntl.h> // for fcntl, open +#include <stdio.h> // for perror +#include <sys/types.h> // for read +#include <sys/uio.h> // for read +#include <unistd.h> // for read unistd etc won't work on windows. Please use the llvm/Support/ MemoryBuffer.h API for reading files. The LineReader is easier to implement with MemoryBuffer because you just scan for \n and \r in the memory range. +std::string stringFromMCInst_x86(const MCInst& insn, + const Target* target, + std::string& triple) { Please mark this static so that you can shrink the anon namespace. + X86ATTAsmPrinter* asmPrinter + dynamic_cast<X86ATTAsmPrinter*>(functionPass); Please don't use dynamic_cast: we're trying to eliminate RTTI a C cast should be fine. +int HexDisassembler::disassemble(const Target* target, + LineReader& reader) { ... + + if(targetName == "x86") { + disasm = new X86Disassembler::X86_32Disassembler(obj, logstream); + triple = "x86-apple-darwin"; + } + else if(targetName == "x86-64") { + disasm = new X86Disassembler::X86_64Disassembler(obj, logstream); + triple = "x86_64-apple-darwin"; + } + else { Instead of doing something like this, the X86 disassembler should register itself as part of the "Target" API the same way that asmprinters, MCAsmInfo, targets, etc do. The X86 specific disassembler should go in lib/Target/X86/Disassembler. Take a look at how the Target/X86/AsmParser stuff registers itself for a good example. --- tools/llvm-mc/HexDisassembler.h (revision 0) +++ tools/llvm-mc/HexDisassembler.h (revision 0) +#include <fstream> +#include <string> the fstream #include is not needed. It might make sense to change the disassemble APIs to take a MemoryBuffer, and have the llvm-mc.cpp driver just handle all the file reading stuff. I haven't reviewed the X86 specific changes yet, it would be good to get the high level design sorted out first, I'll review it in the next patch. Please commit the MemoryObject.h changes (int64_t + comments) whenever you feel like it, and propose the basic API changes without the X86 specific part next. Thanks for working on this Sean, it's great work! -Chris
On Aug 19, 2009, at 4:39 PM, Sean Callanan wrote:> Bill, > > thanks for your comments. I'll respond to them individually. I've > attached a new revision of the patch that addresses them. Patch > built and tested against SVN 79487, with the additional attached fix > that fixes an Intel table bug. >Hi Sean, Thanks for the fixes!>> 1. Includes like this "#include <llvm/MC/MCInst.h>" should be in >> quotes, not angle brackets. > > Fixed. >I still saw a few more sneaking in there. Also, don't put system headers (like assert.h, cstdio, etc.) in quotes. I just meant the LLVM headers. :-)>> 2. In the "class Indenter", the "push" and "pop" methods check for >> over/under runs before proceeding. Should these be "asserts" instead? > > Yeah, I made them asserts as you suggested. I was thinking about > maybe making them saturate, but honestly if we run out of space we > should simply break and allow the developer to make the Indenter > bigger. Maybe we should have another argument to the constructor > specifying how large the buffer should be...? >Either that or make it a SmallVector with initial size "256". I *think* that all of the elements in that vector are guaranteed to be contiguous. (Might want to check.) But that way you won't have to worry about the array growing too big and needing more space.>> 3. In include/llvm/Support/MemoryObject.h, you changed all >> "uintptr_t" to "uint64_t". Is there a good reason for this? If not, >> please change them back.' > > Yes, there is. If you're disassembling in a memory space that's > larger than yours (for example, on a computer where LLVM is built 32- > bit but the source of the bytes is a 64-bit address space), then I > don't want to be limited by the local process's notion of uintptr_t. >I see. As Chris said, that makes sense. :) I was just worried that bits would fall on the floor if you convert a pointer value to a fixed bit-width size. Of course, when we go to 128-bit processors, we'll have to change this. :-D>> 4. If you have a class that's completely contained within a .cpp >> file, go ahead and define it in an anonymous namespace and mark it >> "VISIBILITY_HIDDEN". Like this: >> >> namespace { >> class VISIBILITY_HIDDEN StringMemoryObject : public MemoryObject { >> ... > > I used VISIBILITY_HIDDEN for both StringMemoryObject and > X86DisassemblerEmitter::X86DEBackend. > I wrapped StringMemoryObject in an anonymous namespace. > I didn't wrap X86DEBackend because it's already nested. >Thanks.>> 5. In "readBytes" in StringMemoryObject, you're returning a "-1", >> but the method's return type is "uint64_t". Do you *really* want to >> return 0xfffffffffffU, or is "-1" a sentinel value of some sort? If >> the latter, perhaps changing the return type to "int64_t" instead. >> If the former, then changing the return to "-1U" to make it explicit. > > -1 is a sentinel value. I changed readBytes to return an int and > accept a uint64_t* that it fills in if it's non-NULL. >Thanks. I'm not a huge fan of sentinel values if they can be avoided.>> 7. Look for opportunities to use the LLVM small container classes. >> For instance, the "fOperands" ivar in RecognizableInsn. > > I looked at my uses of std::vector but they wouldn't really benefit > from the good performance of SmallVector for small values of N. > > - fInstructions in X86Disassembler.h is not a good candidate for > this, because the number of instructions could potentially be quite > large. > - numberedInstructions in X86DisassemblerEmitter.cpp is not a good > candidate because there are upwards of a thousand instructions in > the x86 table. > - fInstructionSpecifiers in X86DisassemblerTables.h is not a good > candidate either... the number is (again) very large. > - fOperands is not a good candidate because is never actually > created. It is always set to point to > CodeGenInstruction::OperandList. >Thanks for checking!>> 11. In several places, you put 'default: assert(0 && "Badness >> happens here!");" at the end of a switch statement. When assertions >> are turned off, I believe that this could emit warnings. It would >> be better to put it as the first case of the "switch". And maybe >> use the new "llvm_unreachable()" macro as well. > > All right, I moved them and used llvm_unreachable(). In the C code, > where I can't safely #include "llvm/Support/ErrorHandling.h," I > declared a macro that does the same thing. >Great!>> 15. Prefer pre-increment/decrement for variables: >> >> http://llvm.org/docs/CodingStandards.html#micro_preincrement > > Learned a new one there. Thanks! >No prob. :-)>> 16. In X86RecognizableInsn.cpp, you use the large macro: >> HANDLE_OPERAND. Could you make it an inlined function instead? It >> could take a pointer to a member function for the method it wants >> to call. Debugging large macros is impossible. :) > > I factored the code out into a function. There are still macros to > keep things pretty, but they just wrap the function. >Thanks!>> 17. I'm confused by the C files in lib/Target/X86. Why aren't they C >> ++ files? > > The reason is that I am aware of clients that need disassembly but > can't link in C++ code, so for those clients I've separated out a C > core. They won't get the benefits of the MCInst, but that's their > problem :) >Ah! Okay...I wonder if it would be good to place them in the llvm-c bindings somehow? I honestly don't know much about those bindings, so this might not be feasible. But it would be good to keep icky C code out of pristine C++ code (tongue *firmly* planted in cheek). -bw
I was away doing other things for a while, but I have an API patch separated out, which (in addition to being much smaller than past megapatches) corrects two issues Chris identified in his most recent set of patches: - First, it makes the API a good deal simpler. Now, you can instantiate a single MCDisassembler and, each time you want an instruction disassembled, you can simply pass it a MemoryRegion, an offset, and an MCInst to populate. - Second, it adds MCDisassembler to the list of things you can get from a TargetMachine, so you don't have to #include something from lib/ Target/X86 just to use the X86 disassembler. Please let me know what you think of this API. Thanks for your time. Sean I have included Chris's previous comments below. On Aug 22, 2009, at 4:31 PM, Chris Lattner wrote:> The MCDisassembler API is also somewhat strange to me. Right now it > is designed to analyze a memory region when constructed and contain > them. It seems that there is a more useful low level API which > would look something like this: > > class MCDisassembler { > public: > > virtual ~MCDisassembler(); > > /// DisassembleInstruction - Disassemble the first instruction in > the specified region, printing the disassembled instruction to the > specified raw_ostream, and returning the size of the instruction in > bytes. On error, this returns zero and fills in ErrorInfo with a > human readable description of the error. > virtual unsigned DisassembleInstruction(MemoryObject ®ion, > raw_ostream &OS, std::string &ErrorInfo) = 0; > > } > > Having this sort of stateless API means that higher level clients > (which are stateful) can be built on top of them without adding > overhead to clients who don't care about the state.> Instead of doing something like this, the X86 disassembler should > register itself as part of the "Target" API the same way that > asmprinters, MCAsmInfo, targets, etc do. The X86 specific > disassembler should go in lib/Target/X86/Disassembler. Take a look > at how the Target/X86/AsmParser stuff registers itself for a good > example.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090903/16292849/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm-disassembler-api.diff Type: application/octet-stream Size: 10632 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090903/16292849/attachment.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090903/16292849/attachment-0001.html>