John Reagan via llvm-dev
2017-Feb-03 19:03 UTC
[llvm-dev] RFC: Add a way to interleave source code in assembler output
I want to jump in on this too. For our porting of OpenVMS compilers (BASIC, COBOL, Pascal, Fortran, C, VAX Macro assembler, BLISS) to using LLVM for our x86 port, we want to provide some scheme for our traditional OpenVMS "listing files" which an optionally include generated machine code, cross reference information, command line summary, frontend generated messages interspersed with the source listing, display of header file contents, expansion of macros, optimization information, inlining heuristic results, etc. My plan was to come back here with an RFC later this year after we have our early cross-compilers in place for the OS team to do their porting effort. I have some resources that could help with such an effort. Besides using these listing for debugging efforts, we also use them for system archive purposes. Over the last 30+ years doing this, having a full machine code listing from an older release is invaluable for debugging system crashes, etc. With such a listing file, you can see EXACTLY the source that was compiled and the exact generated code all in a single file. Our generated code contains symbolized variable and routine names along with line number information. And by having a qualifier summary at the tail end of the listing file, you can always tell exactly which command line options that were specified by the make files/command files/etc. that built the software. What I envision is some listing manager that is feed from various places in the compiler. The frontend (not just clang but any frontend) could provide file information, source line information, error message information, command line processing, etc. The backend can provide optimization data, inline decisions, generated code, diagnostics for uninitialized variables, unreachable code, etc. Then the listing manager would collect, sort, etc. and generate the single listing file. You might be able to cobble something together inside a driver if you pick apart the output from various tools, but it wouldn't have the look and feel from a single generated file. As for interspersed machine code, that is often only useful with O0 or O1 compilations. Once you get to O2 and higher, it is often just best to have the machine code following the source code with line numbers (either as ".loc" directives or just with "end of line" comments). In many cases, we use the comment area to denote "interesting" instructions in the prologue/epilogue that correspond to unwind information. The traditional VMS listings are 132 columns wide with "^L" form feeds between sections since they were originally designed to be printed on greenbar printer paper (Google it if don't know what I'm talking about :) ) Here's a little abridged example for the traditional "hello world" C program. 1 #include <stdio.h> X 1612 #if 0 X 1613 An X is placed at the left to show this is eXcluded X 1614 #endif 1615 #define m(p) int p; 1 1616 main () { 1 1617 m(ii); E int ii ; 1 1618 printf("hello world\n"); 1 1619 } ^L Machine Code Listing 3-FEB-2017 14:01:37 VSI C V7.4-001-50L7J Page 2 3-FEB-2017 14:00:56 WORK20:[JREAGAN]HW.C;17 .psect $CODE$, CON, LCL, SHR, EXE, NOWRT, NOVEC, NOSHORT .proc __MAIN .align 32 .global __MAIN .personality DECC$$SHELL_HANDLER .handlerdata -8 __MAIN: // 001616 { .mii 002C009229C0 0000 alloc r39 = rspfs, 6, 3, 8, 0 0120000A0380 0001 mov r14 = 80 010800100A00 0002 mov r40 = gp ;; // r40 = r1 } { .mib 010028E183C0 0010 sub r15 = sp, r14 // r15 = r12, r14 000188000980 0011 mov r38 = rp // r38 = br0 004000000000 0012 nop.b 0 ;; } ..... +------------+ | SYMBOL MAP | +------------+ Identifier name Line Size Aligned Storage Cl. Type _______________ ____ ____ _______ ___________ ____ DECC$RECORD_READ 1433 4 long Extern Function returning signed int DECC$RECORD_WRITE 1434 4 long Extern Function returning signed int FILE 646 4 long Typedef: short pointer to struct _iobuf __FILE 496 4 long Typedef: short pointer to struct _iobuf __FILE_ptr32 497 4 long Typedef: short pointer to short pointer to struct _iobuf __caddr_t 469 4 long Typedef: short pointer to char __char_ptr32 499 4 long Typedef: short pointer to char __char_ptr64 551 4 long Typedef: short pointer to char __char_ptr_const_ptr32 515 4 long Typedef: short pointer to const short pointer to char __char_ptr_const_ptr64 555 4 long Typedef: short pointer to const short pointer to char __char_ptr_ptr32 514 4 long Typedef: short pointer to short pointer to char __char_ptr_ptr64 554 4 long Typedef: short pointer to short pointer to char ..... ^L Source Listing 3-FEB-2017 13:49:45 VSI C V7.4-001-50L7J Page 10 3-FEB-2017 13:49:28 WORK20:[JREAGAN]HW.C;16 CC/LIST/MACH/SHOW=SYMBOLS HW Hardware: /ARCHITECTURE=GENERIC /OPTIMIZE=TUNE=GENERIC These macros are in effect at the start of the compilation. ----- ------ --- -- ------ -- --- ----- -- --- ------------ __G_FLOAT=0 __DECC=1 vms=1 VMS=1 __32BITS=1 __PRAGMA_ENVIRONMENT=1 __ia64__=1 __CRTL_VER=80400000 __vms_version="V8.4-2 " CC$gfloat=0 __X_FLOAT=1 vms_version="V8.4-2 " __DATE__="Feb 3 2017" __STDC_VERSION__=199901L __DECC_MODE_RELAXED=1 __DECC_VER=70490001 __VMS=1 VMS_VERSION="V8.4-2 " __IEEE_FLOAT=1 __VMS_VERSION="V8.4-2 " __TIME__="13:49:45" __ia64=1 __VMS_VER=80420022 __BIASED_FLT_ROUNDS=2 __INITIAL_POINTER_SIZE=0 __STDC__=2 _IEEE_FP=1 __LANGUAGE_C__=1 __vms=1 __D_FLOAT=0> Message: 7 > Date: Fri, 3 Feb 2017 16:31:13 +0000 > From: Roger Ferrer Ibanez via llvm-dev <llvm-dev at lists.llvm.org> > To: "cfe-dev at lists.llvm.org" <cfe-dev at lists.llvm.org>, llvm-dev > <llvm-dev at lists.llvm.org> > Cc: nd <nd at arm.com> > Subject: [llvm-dev] RFC: Add a way to interleave source code in > assembler output > Message-ID: > <DB6PR0802MB2534F3C7B3B6A9FDF7C1E631874F0 at DB6PR0802MB2534.eurprd0 > 8.prod.outlook.com> > > Content-Type: text/plain; charset="us-ascii" > > Dear llvm/clang community, > > I'm interested in adding a way to emit source code interleaved in the > output of the assembler. > > - Introduction > > A feature that several compilers have and clang/llvm is missing is the > possibility of interleaving source code in the assembler output (e.g. > when using -S). > > This feature is useful for a number of reasons. For those users who are > concerned with the quality of the code, code size, debugging and > inspection or analysis of the generated assembler. > > An essential requirement of this feature is having location information > at the point where the assembler code is emitted. Location information > is currently not part of the instruction representation itself but > instead is encoded as part of the debug information. This means that to > have location information we need to make sure the FE is emitting some > minimal amount of debugging information containing location. This is > currently possible in clang using -gline-tables-only but other FE's > might choose to emit this information under some other conditions. > > I made an implementation which shows that the impact on the existing > codebase is low. > > - Rationale > > Closing the gap between input source code and the generated > instructions is important for users that are concerned about the > correctness and quality of the generated code. This feature would help > to reduce this gap by providing better context to the emitted > instructions. Incidentally it can also help debugging wrong code. > > - Related work > > This is a feature commonly available in production compilers > [1][2][3][4]. > > [1] > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472m/ch > r1359124927770.html > [2] https://gcc.gnu.org/gcc-7/changes.html (see "Other significant > improvements" by the end of the document) [3] > https://software.intel.com/en-us/node/523027 > [3] https://msdn.microsoft.com/en-us/library/367y26c6.aspx > > https://llvm.org/bugs/show_bug.cgi?id=16647 > https://llvm.org/bugs/show_bug.cgi?id=17465 suggests some workarounds. > A comment also points to a patch that I could not retrieve. > > - Proposal > > This proposal currently spans LLVM and clang. > > -- clang/FE changes > > For clang it would simply mean to add a flag like -fsource-asm or maybe > extend the meaning of -fverbose-asm (like it will happen in GCC 7 but > see some further comments below). This flag would make sure that the > minimal amount of debug information is generated. Currently this means > enabling -gline-tables-only in absence of any other debugging flag > specified. A flag -masm-source for communicating the driver and cc1 > will be added as well. > > Other FE's can provide other specific mechanisms to enable source > interleave. > > -- llvm changes > > For llvm I suggest creating a new AsmPrinterHandler called, > tentatively, SourceInterlave that would be responsible of printing the > lines related to the instructions. SourceInterleave would take care of > loading the files and making sure the source code lines are emitted as > comments. > > This handler would be enabled through MCOptions (similar to what > happens with AsmVerbose). The current option is tentatively called > AsmSource. > > Currently AsmPrinterHandler mechanism looks slightly geared towards > debug information but it also used for EH. So I think using it for > printing interleaved source is a good fit. > > - Discussion > > In case this proposal is positively received I would like to gather > some feedback on the following items. > > -- The name of the flag itself for clang > > My current implementation uses -fsource-asm but maybe we want to > integrate this feature in -fverbose-asm for this (as gcc 7 will do). I > have no strong preference, but maybe overloading -fverbose-asm may have > some undesirable consequences: recall that we need to enable some, even > if minimal, debugging information in clang for this feature to be > useable. > > -- Enabling debug information causes debug information also to be > emitted > > This currently makes the output unnecessarily hard to read due > basically to .loc directives. > > Currently my implementation uses "-masm-source=1" and "-masm-source=2" > for cc1 which is then communicated to the MCOption AsmSource. When > AsmSource is not 1, debug is emitted as usual, otherwise only > SourceInterleave is used. > > This way > "clang -fsource-asm" would pass "-masm-source=1". So only interleaved > source would be printed, without the extra debug directives. > "clang -fsource-asm -g" (or any other debug enabling flag) would pass > "-masm-source=2" extending the current behaviour of emitting debug > information with interleaved source. > > I think this is OK but maybe there is some subtlety regarding "having > debug information around but not generating its directives" as it would > happen under AsmSource==1. > > Also -masm-source=1/-masm-source=2 are just stand-ins. Something a bit > more explanatory like -masm-source=nodebug and -masm-source=debug can > be used instead. > > -- Would it make sense to map the "/FAs" flag of clang-cl to this > feature as well? > > I can't really answer this question because I am not sure what are the > expectations of the clang-cl users in terms of closeness to VS's cl.exe > behaviour. > > Looking forward your feedback. I can put in phabricator the patches for > my current implementation if this helps the discussion. > > Kind regards, > Roger