Zachary Turner via llvm-dev
2019-Jan-14 21:48 UTC
[llvm-dev] [llvm-pdbutil] : merge not working properly
Yes I am the person who wrote this feature (along with most other PDB-related features). I thought about some and I think it's a bit hard (if not impossible) to merge PDBs in this way. Here's a short list of things I came up with 1) We need to merge the list of modules. This requires first detecting if two modules are actually the same. For example, if I run llvm-pdbutil on a random PDB on my disk, I get this (output is trimmed for brevity) $ llvm-pdbutil.exe dump -modules bin\not.pdb Modules ===========================================================Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: Obj: `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: debug stream: 14, # files: 80, has ec info: false pdb file ni: 0 ``, src file ni: 0 `` Mod 0001 | `lib\Support\CMakeFiles\LLVMSupport.dir\Program.cpp.obj`: Obj: `D:\src\llvmbuild\cl\Debug\x64\lib\LLVMSupport.lib`: debug stream: 46, # files: 102, has ec info: false pdb file ni: 0 ``, src file ni: 0 `` The easiest thing to do is consider them to be the same only if both the module name and object name are identical, but depending on your use case this might not be sufficient (for example what if the 2 PDBs were built in different output directories, then you might have `D:\foo\not.cpp.obj` in one PDB and `D:\bar\not.cpp.obj` in another one. So we would need to find a solution that makes sense here. 2) When two modules are the same, we need to merge their file list and debug stream. In the above example: $ llvm-pdbutil.exe dump -files -modi=0 bin\not.pdb Files ===========================================================Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: - (MD5: 2FE06AF7EACFB232C6FF033DBFC4412E) c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\stdexcept - (MD5: 0B299654FBC61F03E9533F9296BBD2B3) c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\xstring etc... $ llvm-pdbutil.exe dump -symbols -modi=0 bin\not.pdb Symbols =========================================================== Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: 4 | S_OBJNAME [size = 80] sig=0, `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj` 84 | S_COMPILE3 [size = 60] machine = intel x86-x64, Ver = Microsoft (R) Optimizing Compiler, language = c++ frontend = 19.16.27024.1, backend = 19.16.27024.1 flags = security checks | hot patchable 144 | S_UNAMESPACE [size = 20] `__vc_attributes` The file list is easy, but for the symbol records, some of these records might be the same in 2 different object files, and some might be different. So we need to de-duplicate them into the final PDB. LLD actually already does this, so a lot of the code for this portion is probably already written in LLD. See PDBLinker::mergeSymbolRecords in lld/COFF/PDB.cpp. The algorithm is slightly different when merging 2 PDBs, but that's the general idea. 3) We need to merge the publics and globals stream, similar to the above. For #2 and #3 above, this is going to be tricky. How do you know if 2 symbols are actually the same symbol? Even if they have the same name it might, for example, be a symbol for a certain function F. Suppose the first PDB is for executable A, and the second PDB is for executable B. What if the generated code for function F in executable A differs from the generated code for F in executable B? Does that end up as two symbols in the merged PDB or 1? I'm not sure if there's a good way to handle this. I guess it might help to know more about your intended use case. Then we might be able to make some simplifications to the problem that would allow us to decide on a reasonable solution. On Mon, Jan 14, 2019 at 5:39 AM Vivien Millet <vivien.millet at gmail.com> wrote:> Were you the man in charge of this feature ? If not, do you know who was > in charge (to see what could be the best way / what is missing to complete > this feature) ? > > Le lun. 24 déc. 2018 à 02:01, Zachary Turner <zturner at google.com> a > écrit : > >> The merge feature was implemented primarily for testing but was never >> really productionized, so your guess about what the underlying problem is >> sounds correct to me. We could probably hide the subcommand so users don’t >> accidentally use it, or if someone wants to properly implement the missing >> features, that would be even better >> On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> When trying to merge 2 pdbs which have each their own DBI stream, I >>> endup with a pdb with an inconsistent number of stream and no DBI stream >>> (or at least not at fixed index 3, producing a corrupt error when dumping >>> with -l). >>> Looking at the code, it seems that we don't merge other streams than TPI >>> and IPI streams, am I right ? >>> Is the "merge" feature completely implemented ? >>> Thanks >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/f9e86c51/attachment.html>
Vivien Millet via llvm-dev
2019-Jan-15 10:50 UTC
[llvm-dev] [llvm-pdbutil] : merge not working properly
Hello Zachary ! Thanks for your time ! So you are one of the happy guys who suffered from the lack of PDB format information :) To be honest I'm really a beginner in the PDB stuff, I just read some llvm documentation to understand what went wrong when merging my PDBs. In my case, what I do with my team and try to achieve is this : - Run our application under a visual studio debugger - Generate JIT code ( using llvm MCJIT ) - Then, either : - export as COFF obj file with dwarf information and then convert it with cv2pdb to obtain a pdb of my JIT symbols (what I do now) - export directly to PDB my JIT debug info (what i would like to do, if you have an idea how..) - Detach the visual studio debugger - Merge my JIT pdb into a copy of the executable pdb (where things start to go bad..) - Replace original executable by the copy (creating a backup of original) - Reattach the visual studio debugger to my executable (loading the new pdb version) - Debug JIT code with visual studio. - On each JIT rebuild, restart these steps from the original native executable PDB to avoid merge conflict between the multiple JIT iterations So, concerning the three stages you describe: - 1) : I would be even more naive : I would consider every module as a new module without trying to merge them by name (but I might be too naive..) - 2) and 3) : Same here, in my case I won't have same symbols/modules conflicting, it is impossible, so I would choose again the simplest and naive case : by default every symbol is always a new symbol (addition). Options for 'merge' feature could be the classical mathematics group operations : addition(default)/union/intersection. And this option could apply to different level of merge (modules -> symbols -> etc). I'm not a warrior of PDB like you and I don't know what consequences there is to choose one or another way on the PDB final structure and readability. Then I trust you on the good choice to take. Do you think what I try to achieve is doable ? Would you help me to do it ? Thank you PS : BTW, If you or someone knows another (better/easier) way to debug MCJIT code with visual studio, I'm really open to hear about it ! Le lun. 14 janv. 2019 à 22:49, Zachary Turner <zturner at google.com> a écrit :> Yes I am the person who wrote this feature (along with most other > PDB-related features). > > I thought about some and I think it's a bit hard (if not impossible) to > merge PDBs in this way. Here's a short list of things I came up with > > 1) We need to merge the list of modules. This requires first detecting if > two modules are actually the same. For example, if I run llvm-pdbutil on a > random PDB on my disk, I get this (output is trimmed for brevity) > > $ llvm-pdbutil.exe dump -modules bin\not.pdb > > Modules > ===========================================================> Mod 0000 | > `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: > Obj: > `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: > debug stream: 14, # files: 80, has ec info: false > pdb file ni: 0 ``, src file ni: 0 `` > Mod 0001 | `lib\Support\CMakeFiles\LLVMSupport.dir\Program.cpp.obj`: > Obj: `D:\src\llvmbuild\cl\Debug\x64\lib\LLVMSupport.lib`: > debug stream: 46, # files: 102, has ec info: false > pdb file ni: 0 ``, src file ni: 0 `` > > > The easiest thing to do is consider them to be the same only if both the > module name and object name are identical, but depending on your use case > this might not be sufficient (for example what if the 2 PDBs were built in > different output directories, then you might have `D:\foo\not.cpp.obj` in > one PDB and `D:\bar\not.cpp.obj` in another one. So we would need to find > a solution that makes sense here. > > 2) When two modules are the same, we need to merge their file list and > debug stream. In the above example: > > $ llvm-pdbutil.exe dump -files -modi=0 bin\not.pdb > Files > ===========================================================> Mod 0000 | > `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: > - (MD5: 2FE06AF7EACFB232C6FF033DBFC4412E) c:\program files (x86)\microsoft > visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\stdexcept > - (MD5: 0B299654FBC61F03E9533F9296BBD2B3) c:\program files (x86)\microsoft > visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\xstring > etc... > > $ llvm-pdbutil.exe dump -symbols -modi=0 bin\not.pdb > Symbols > ===========================================================> Mod 0000 | > `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`: > 4 | S_OBJNAME [size = 80] sig=0, > `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj` > 84 | S_COMPILE3 [size = 60] > machine = intel x86-x64, Ver = Microsoft (R) Optimizing > Compiler, language = c++ > frontend = 19.16.27024.1, backend = 19.16.27024.1 > flags = security checks | hot patchable > 144 | S_UNAMESPACE [size = 20] `__vc_attributes` > > The file list is easy, but for the symbol records, some of these records > might be the same in 2 different object files, and some might be > different. So we need to de-duplicate them into the final PDB. LLD > actually already does this, so a lot of the code for this portion is > probably already written in LLD. See PDBLinker::mergeSymbolRecords in > lld/COFF/PDB.cpp. The algorithm is slightly different when merging 2 PDBs, > but that's the general idea. > > > 3) We need to merge the publics and globals stream, similar to the above. > > > For #2 and #3 above, this is going to be tricky. How do you know if 2 > symbols are actually the same symbol? Even if they have the same name it > might, for example, be a symbol for a certain function F. Suppose the > first PDB is for executable A, and the second PDB is for executable B. > What if the generated code for function F in executable A differs from the > generated code for F in executable B? Does that end up as two symbols in > the merged PDB or 1? I'm not sure if there's a good way to handle this. > > I guess it might help to know more about your intended use case. Then we > might be able to make some simplifications to the problem that would allow > us to decide on a reasonable solution. > > On Mon, Jan 14, 2019 at 5:39 AM Vivien Millet <vivien.millet at gmail.com> > wrote: > >> Were you the man in charge of this feature ? If not, do you know who was >> in charge (to see what could be the best way / what is missing to complete >> this feature) ? >> >> Le lun. 24 déc. 2018 à 02:01, Zachary Turner <zturner at google.com> a >> écrit : >> >>> The merge feature was implemented primarily for testing but was never >>> really productionized, so your guess about what the underlying problem is >>> sounds correct to me. We could probably hide the subcommand so users don’t >>> accidentally use it, or if someone wants to properly implement the missing >>> features, that would be even better >>> On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> When trying to merge 2 pdbs which have each their own DBI stream, I >>>> endup with a pdb with an inconsistent number of stream and no DBI stream >>>> (or at least not at fixed index 3, producing a corrupt error when dumping >>>> with -l). >>>> Looking at the code, it seems that we don't merge other streams than >>>> TPI and IPI streams, am I right ? >>>> Is the "merge" feature completely implemented ? >>>> Thanks >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190115/197fdf9e/attachment.html>
Zachary Turner via llvm-dev
2019-Jan-15 21:48 UTC
[llvm-dev] [llvm-pdbutil] : merge not working properly
On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <vivien.millet at gmail.com> wrote:> Hello Zachary ! > Thanks for your time ! > So you are one of the happy guys who suffered from the lack of PDB format > information :) >Yes, that would be me :)> To be honest I'm really a beginner in the PDB stuff, I just read some llvm > documentation to understand what went wrong when merging my PDBs. > In my case, what I do with my team and try to achieve is this : > - Run our application under a visual studio debugger > - Generate JIT code ( using llvm MCJIT ) > - Then, either : > - export as COFF obj file with dwarf information and then convert it > with cv2pdb to obtain a pdb of my JIT symbols (what I do now) > - export directly to PDB my JIT debug info (what i would like to do, if > you have an idea how..) > - Detach the visual studio debugger > - Merge my JIT pdb into a copy of the executable pdb (where things start > to go bad..) > - Replace original executable by the copy (creating a backup of original) > - Reattach the visual studio debugger to my executable (loading the new > pdb version) > - Debug JIT code with visual studio. > - On each JIT rebuild, restart these steps from the original native > executable PDB to avoid merge conflict between the multiple JIT iterations >Yea, it's an interesting use case. It makes me think it would be nice if the PDB format supported some way of having a symbol which simply refers to another PDB file, that way you could re-write that PDB file at runtime once all your code is jitted, and when the debugger tries to look up that symbol, it finds a record that tells it to go check the other PDB file. So, here are the things I think you would need to do: 1) Create a JIT module in the module list with a unique name. All symbols will go here. llvm-pdbutil dump -modules shows you the list. Be careful about putting it at the end though, because there's already one at the end called * LINKER * that is kind of special. On the other hand, you don't want to put it first because it means you will have to do lots of fixups on the EXE PDB. It's probably best to add it right before the linker module, this has the least chance of breaking anything. 2) In the debug stream for this module, add all symbols. You will need to fix up their type indices. As you noticed, llvm-pdbutil already merges type information from the JIT PDB, so after merging the type indices in the EXE PDB will be different than they were in the JIT PDB, but the symbol records will refer to the JIT PDB type indices. So these need to be fixed up. LLD already has code to do this, you can probably borrow a similar algorithm with some slight modifications (lldb/COFF/PDB.cpp, search for mergeSymbolRecords) 3) Merge in the new section contributions and section map. See LLD again for how to modify these. Hopefully the object file you exported contains relocated symbol addresses so you don't have to do any fixups here. 4) Merge in the publics and globals. This shouldn't be too hard, I think you can just iterate over them in the JIT PDB and add them to the new EXE PDB. You're kind of in uncharted territory here, so this is just a rough idea of what needs to be done. There may be other issues that you don't encounter until you actually try it out. Unfortunately I don't personally have the time to work on this, but it sounds neat, and I'm happy to help if you run into questions or problems along the way. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190115/cfd5270a/attachment.html>