Zachary Turner via llvm-dev
2018-Dec-24 01:01 UTC
[llvm-dev] [llvm-pdbutil] : merge not working properly
The merge feature was implemented primarily for testing but was never really productionized, so your guess about what the underlying problem is sounds correct to me. We could probably hide the subcommand so users don’t accidentally use it, or if someone wants to properly implement the missing features, that would be even better On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev < llvm-dev at lists.llvm.org> wrote:> When trying to merge 2 pdbs which have each their own DBI stream, I endup > with a pdb with an inconsistent number of stream and no DBI stream (or at > least not at fixed index 3, producing a corrupt error when dumping with -l). > Looking at the code, it seems that we don't merge other streams than TPI > and IPI streams, am I right ? > Is the "merge" feature completely implemented ? > Thanks > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181223/c638a41c/attachment.html>
Zachary Turner via llvm-dev
2019-Jan-14 21:48 UTC
[llvm-dev] [llvm-pdbutil] : merge not working properly
Yes I am the person who wrote this feature (along with most other PDB-related features). I thought about some and I think it's a bit hard (if not impossible) to merge PDBs in this way. Here's a short list of things I came up with 1) We need to merge the list of modules. This requires first detecting if two modules are actually the same. For example, if I run llvm-pdbutil on a random PDB on my disk, I get this (output is trimmed for brevity) $ llvm-pdbutil.exe dump -modules bin\not.pdb Modules ===========================================================Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: Obj: `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: debug stream: 14, # files: 80, has ec info: false pdb file ni: 0 ``, src file ni: 0 `` Mod 0001 | `lib\Support\CMakeFiles\LLVMSupport.dir\Program.cpp.obj`: Obj: `D:\src\llvmbuild\cl\Debug\x64\lib\LLVMSupport.lib`: debug stream: 46, # files: 102, has ec info: false pdb file ni: 0 ``, src file ni: 0 `` The easiest thing to do is consider them to be the same only if both the module name and object name are identical, but depending on your use case this might not be sufficient (for example what if the 2 PDBs were built in different output directories, then you might have `D:\foo\not.cpp.obj` in one PDB and `D:\bar\not.cpp.obj` in another one. So we would need to find a solution that makes sense here. 2) When two modules are the same, we need to merge their file list and debug stream. In the above example: $ llvm-pdbutil.exe dump -files -modi=0 bin\not.pdb Files ===========================================================Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: - (MD5: 2FE06AF7EACFB232C6FF033DBFC4412E) c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\stdexcept - (MD5: 0B299654FBC61F03E9533F9296BBD2B3) c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\xstring etc... $ llvm-pdbutil.exe dump -symbols -modi=0 bin\not.pdb Symbols =========================================================== Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: 4 | S_OBJNAME [size = 80] sig=0, `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj` 84 | S_COMPILE3 [size = 60] machine = intel x86-x64, Ver = Microsoft (R) Optimizing Compiler, language = c++ frontend = 19.16.27024.1, backend = 19.16.27024.1 flags = security checks | hot patchable 144 | S_UNAMESPACE [size = 20] `__vc_attributes` The file list is easy, but for the symbol records, some of these records might be the same in 2 different object files, and some might be different. So we need to de-duplicate them into the final PDB. LLD actually already does this, so a lot of the code for this portion is probably already written in LLD. See PDBLinker::mergeSymbolRecords in lld/COFF/PDB.cpp. The algorithm is slightly different when merging 2 PDBs, but that's the general idea. 3) We need to merge the publics and globals stream, similar to the above. For #2 and #3 above, this is going to be tricky. How do you know if 2 symbols are actually the same symbol? Even if they have the same name it might, for example, be a symbol for a certain function F. Suppose the first PDB is for executable A, and the second PDB is for executable B. What if the generated code for function F in executable A differs from the generated code for F in executable B? Does that end up as two symbols in the merged PDB or 1? I'm not sure if there's a good way to handle this. I guess it might help to know more about your intended use case. Then we might be able to make some simplifications to the problem that would allow us to decide on a reasonable solution. On Mon, Jan 14, 2019 at 5:39 AM Vivien Millet <vivien.millet at gmail.com> wrote:> Were you the man in charge of this feature ? If not, do you know who was > in charge (to see what could be the best way / what is missing to complete > this feature) ? > > Le lun. 24 déc. 2018 à 02:01, Zachary Turner <zturner at google.com> a > écrit : > >> The merge feature was implemented primarily for testing but was never >> really productionized, so your guess about what the underlying problem is >> sounds correct to me. We could probably hide the subcommand so users don’t >> accidentally use it, or if someone wants to properly implement the missing >> features, that would be even better >> On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> When trying to merge 2 pdbs which have each their own DBI stream, I >>> endup with a pdb with an inconsistent number of stream and no DBI stream >>> (or at least not at fixed index 3, producing a corrupt error when dumping >>> with -l). >>> Looking at the code, it seems that we don't merge other streams than TPI >>> and IPI streams, am I right ? >>> Is the "merge" feature completely implemented ? >>> Thanks >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/f9e86c51/attachment.html>