Will Wilson via llvm-dev
2017-Jun-08 12:07 UTC
[llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM
Hi Zach (or anyone else who may have a clue), I'm currently investigating making use of LLVM for PDB parsing for with a view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS DIA SDK hasn't been updated to handle them. I know this is probably low on your priority list but since /DEBUG:FASTLINK is now the implied default for VS2017 I figure it's a good time to take a look at it. Unfortunately I'm finding very little information on the internal structure used by partial PDBs. It seems https://github.com/Microsoft/microsoft-pdb doesn't offer much either, unless I'm missing something... So, two questions: Are you planning to try and support partial PDBs? And do you have any good references for their layout? Many thanks, Will. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170608/eef730cb/attachment.html>
Zachary Turner via llvm-dev
2017-Jun-08 16:43 UTC
[llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM
I didn't believe you at first that DIA SDK didn't support partial PDBs, so I went and tried `llvm-pdbdump pretty -types foo.pdb` on a partial PDB and it caused llvm-pdbdump to crash. When I looked further, it turns out IDiaSymbol::findChildren() is returning E_NOTIMPL. Wow! I'm a bit surprised honestly. I've pushed a fix for this in r304982, but all that does is make llvm-pdbdump not crash. It still doesn't display any types. Luckily llvm-pdbdump has another mode (accessible via the `raw` subcommand) that can bypass the DIA SDK and show you the underlying structure. Here's what I get when I try dumping types of a partial PDB. D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -tpi-records cpptest.pdb Type Info Stream (TPI) { TPI Version: 20040203 Record count: 0 Records [ TypeIndexOffsets [ ] ] } Umm, ok. So there's *actually* no types in the PDB. Let's try symbols. D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms cpptest.pdb DBI Stream { # snip Modules [ { Name: test2.obj # snip Symbols [ { UnknownSym { Kind: 0x1167 Length: 52 } } { UnknownSym { Kind: 0x1167 Length: 64 } } { UnknownSym { Kind: 0x1167 Length: 60 } } # thousands of similar lines snipped. So this is a little bit more interesting. Let's see what these records look like: D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms -sym-record-bytes cpptest.pdb DBI Stream { # snip Modules [ { Name: test2.obj # snip Symbols [ { UnknownSym { Kind: 0x1167 Length: 52 } Bytes ( 0000: 30140000 04005F5F 76635F61 74747269 |0.....__vc_attri| 0010: 62757465 733A3A65 76656E74 5F736F75 |butes::event_sou| 0020: 72636541 74747269 62757465 00000000 |rceAttribute....| ) } { UnknownSym { Kind: 0x1167 Length: 64 } Bytes ( 0000: 29140000 04005F5F 76635F61 74747269 |).....__vc_attri| 0010: 62757465 733A3A65 76656E74 5F736F75 |butes::event_sou| 0020: 72636541 74747269 62757465 3A3A6F70 |rceAttribute::op| 0030: 74696D69 7A655F65 00000000 |timize_e....| ) } { UnknownSym { Kind: 0x1167 Length: 60 } Bytes ( 0000: 27140000 04005F5F 76635F61 74747269 |'.....__vc_attri| 0010: 62757465 733A3A65 76656E74 5F736F75 |butes::event_sou| 0020: 72636541 74747269 62757465 3A3A7479 |rceAttribute::ty| 0030: 70655F65 00000000 |pe_e....| ) } { UnknownSym { Kind: 0x1167 Length: 68 } Bytes ( 0000: 0C140000 04005F5F 76635F61 74747269 |......__vc_attri| 0010: 62757465 733A3A68 656C7065 725F6174 |butes::helper_at| 0020: 74726962 75746573 3A3A7631 5F616C74 |tributes::v1_alt| 0030: 74797065 41747472 69627574 65000000 |typeAttribute...| ) } So, this symbol record with kind 0x1167 is pretty interesting, and clearly related to /debug:fastlink. Its format can be deduced as something like this: struct DebugFastLinkRecord { char Unknown[6]; char Name[0]; // null terminated string char Padding[0]; // pad to 4 bytes }; What those first 6 bytes are I can't tell you. Let's see what else we can find. another source of interesting debug info comes from what I refer to as "debug subsections". In an object file, every .debug$S section is basically just a big list of these. In a PDB file though, the debug subsections appear embedded inside of a each module's debug stream. Which is similar to a .debug$S section, but with some additional PDB-specific stuff. You can find llvm-pdbdump's code for parsing this in ModuleDebugStream.cpp Anyway, the part we're interested can be dumped using llvm-pdbdump raw -subsections=unknown. I say unknown because we're looking for stuff that is unique to /debug:fastlink PDBs, so presumably any /debug:fastlink specific data would be something we don't know about / have never seen before. (Note that this command line option hasn't made it upstream yet, it's still in review. But expect it today or tomorrow if all goes well). So we'll try this: bin\llvm-pdbdump raw -subsections=unknown cpptest.pdb DBI Stream { # snip Modules [ { Name: test2.obj # snip Subsections [ Unknown { Kind: 0xFD Data ( 0000: 00000000 00000000 00000000 00000000 |................| 0010: 00000000 00000000 00000000 00000000 |................| 0020: 00000000 00000000 00000000 B0240100 |.............$..| 0030: 00000000 00000000 00000000 00000000 |................| 0040: 00000000 B0240100 90270100 D0270100 |.....$...'...'..| 0050: 90990100 00000000 00000000 90990100 |................| 0060: A49C0100 00000000 00000000 A49C0100 |................| ) } ] } Neat! What is this thing? 0xFD is 253, and looking that up in our DebugSubsectionKind enumeration <https://github.com/llvm-mirror/llvm/blob/master/include/llvm/DebugInfo/CodeView/CodeView.h#L317> shows that this is a CoffSymbolRVA subsection. The format of that subsection can very likely be understood by reading the code in the Microsoft repo, but I haven't investigated it yet. Hopefully this is a good starting point. llvm-pdbdump is a pretty useful tool for investigating these types of issues, so let me know if you try it out and have suggestions for how to improve it. As mentioned, some of the commands I demonstrated above are still not upstream yet, but I'll try to get it in this week. On Thu, Jun 8, 2017 at 5:07 AM Will Wilson <will at indefiant.com> wrote:> Hi Zach (or anyone else who may have a clue), > > I'm currently investigating making use of LLVM for PDB parsing for with a > view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS > DIA SDK hasn't been updated to handle them. I know this is probably low on > your priority list but since /DEBUG:FASTLINK is now the implied default for > VS2017 I figure it's a good time to take a look at it. > > Unfortunately I'm finding very little information on the internal > structure used by partial PDBs. It seems > https://github.com/Microsoft/microsoft-pdb doesn't offer much either, > unless I'm missing something... > > So, two questions: Are you planning to try and support partial PDBs? And > do you have any good references for their layout? > > Many thanks, > Will. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170608/e5dc8a8f/attachment.html>
Will Wilson via llvm-dev
2017-Jun-08 20:38 UTC
[llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM
Hi Zach, A big thanks for the detailed analysis, it's super helpful. I've already being making use of llvm-pdbdump and it's proven very useful for my needs so far - especially in the absence of any real documentation of the PDB format. I ran into the missing partial PDB support in DIA some time ago. Some background reading: https://developercommunity.visualstudio.com/content/problem/4631/dia-sdk-still-doesnt-support-debugfastlink.html MS does provide the mspdbcmf.exe tool ( https://blogs.msdn.microsoft.com/vcblog/2016/10/05/faster-c-build-cycle-in-vs-15-with-debugfastlink/) for conversion from partial to full PDBs. It might prove useful for testing parsing support, although it does seem to have occasional issues with incremental thunks and compiland env entries when parsing the converted PDB via DIA. It's also pretty slow at converting larger PDBs. I should have some more time in the coming week to take a closer look at fastlink related parsing. So when you can get the -subsections=unknown option committed I'll dive in. Many thanks, Will. On 8 June 2017 at 18:43, Zachary Turner <zturner at google.com> wrote:> I didn't believe you at first that DIA SDK didn't support partial PDBs, so > I went and tried `llvm-pdbdump pretty -types foo.pdb` on a partial PDB and > it caused llvm-pdbdump to crash. When I looked further, it turns out > IDiaSymbol::findChildren() is returning E_NOTIMPL. Wow! I'm a bit > surprised honestly. > > I've pushed a fix for this in r304982, but all that does is make > llvm-pdbdump not crash. It still doesn't display any types. > > Luckily llvm-pdbdump has another mode (accessible via the `raw` > subcommand) that can bypass the DIA SDK and show you the underlying > structure. Here's what I get when I try dumping types of a partial PDB. > > D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -tpi-records cpptest.pdb > Type Info Stream (TPI) { > TPI Version: 20040203 > Record count: 0 > Records [ > TypeIndexOffsets [ > ] > ] > } > > Umm, ok. So there's *actually* no types in the PDB. > > Let's try symbols. > > D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms cpptest.pdb > DBI Stream { > # snip > Modules [ > { > Name: test2.obj > # snip > Symbols [ > { > UnknownSym { > Kind: 0x1167 > Length: 52 > } > } > { > UnknownSym { > Kind: 0x1167 > Length: 64 > } > } > { > UnknownSym { > Kind: 0x1167 > Length: 60 > } > } > # thousands of similar lines snipped. > > So this is a little bit more interesting. Let's see what these records > look like: > > D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms > -sym-record-bytes cpptest.pdb > DBI Stream { > # snip > Modules [ > { > Name: test2.obj > # snip > Symbols [ > { > UnknownSym { > Kind: 0x1167 > Length: 52 > } > Bytes ( > 0000: 30140000 04005F5F 76635F61 74747269 |0.....__vc_attri| > 0010: 62757465 733A3A65 76656E74 5F736F75 |butes::event_sou| > 0020: 72636541 74747269 62757465 00000000 |rceAttribute....| > ) > } > { > UnknownSym { > Kind: 0x1167 > Length: 64 > } > Bytes ( > 0000: 29140000 04005F5F 76635F61 74747269 |).....__vc_attri| > 0010: 62757465 733A3A65 76656E74 5F736F75 |butes::event_sou| > 0020: 72636541 74747269 62757465 3A3A6F70 |rceAttribute::op| > 0030: 74696D69 7A655F65 00000000 |timize_e....| > ) > } > { > UnknownSym { > Kind: 0x1167 > Length: 60 > } > Bytes ( > 0000: 27140000 04005F5F 76635F61 74747269 |'.....__vc_attri| > 0010: 62757465 733A3A65 76656E74 5F736F75 |butes::event_sou| > 0020: 72636541 74747269 62757465 3A3A7479 |rceAttribute::ty| > 0030: 70655F65 00000000 |pe_e....| > ) > } > { > UnknownSym { > Kind: 0x1167 > Length: 68 > } > Bytes ( > 0000: 0C140000 04005F5F 76635F61 74747269 |......__vc_attri| > 0010: 62757465 733A3A68 656C7065 725F6174 |butes::helper_at| > 0020: 74726962 75746573 3A3A7631 5F616C74 |tributes::v1_alt| > 0030: 74797065 41747472 69627574 65000000 |typeAttribute...| > ) > } > > So, this symbol record with kind 0x1167 is pretty interesting, and clearly > related to /debug:fastlink. Its format can be deduced as something like > this: > > struct DebugFastLinkRecord { > char Unknown[6]; > char Name[0]; // null terminated string > char Padding[0]; // pad to 4 bytes > }; > > What those first 6 bytes are I can't tell you. > > Let's see what else we can find. another source of interesting debug info > comes from what I refer to as "debug subsections". In an object file, > every .debug$S section is basically just a big list of these. In a PDB > file though, the debug subsections appear embedded inside of a each > module's debug stream. Which is similar to a .debug$S section, but with > some additional PDB-specific stuff. You can find llvm-pdbdump's code for > parsing this in ModuleDebugStream.cpp > > Anyway, the part we're interested can be dumped using llvm-pdbdump raw > -subsections=unknown. I say unknown because we're looking for stuff that > is unique to /debug:fastlink PDBs, so presumably any /debug:fastlink > specific data would be something we don't know about / have never seen > before. (Note that this command line option hasn't made it upstream yet, > it's still in review. But expect it today or tomorrow if all goes well). > > So we'll try this: > > bin\llvm-pdbdump raw -subsections=unknown cpptest.pdb > DBI Stream { > # snip > Modules [ > { > Name: test2.obj > # snip > Subsections [ > Unknown { > Kind: 0xFD > Data ( > 0000: 00000000 00000000 00000000 00000000 |................| > 0010: 00000000 00000000 00000000 00000000 |................| > 0020: 00000000 00000000 00000000 B0240100 |.............$..| > 0030: 00000000 00000000 00000000 00000000 |................| > 0040: 00000000 B0240100 90270100 D0270100 |.....$...'...'..| > 0050: 90990100 00000000 00000000 90990100 |................| > 0060: A49C0100 00000000 00000000 A49C0100 |................| > ) > } > ] > } > > Neat! What is this thing? 0xFD is 253, and looking that up in our DebugSubsectionKind > enumeration > <https://github.com/llvm-mirror/llvm/blob/master/include/llvm/DebugInfo/CodeView/CodeView.h#L317> shows > that this is a CoffSymbolRVA subsection. > > The format of that subsection can very likely be understood by reading the > code in the Microsoft repo, but I haven't investigated it yet. > > Hopefully this is a good starting point. llvm-pdbdump is a pretty useful > tool for investigating these types of issues, so let me know if you try it > out and have suggestions for how to improve it. > > As mentioned, some of the commands I demonstrated above are still not > upstream yet, but I'll try to get it in this week. > > On Thu, Jun 8, 2017 at 5:07 AM Will Wilson <will at indefiant.com> wrote: > >> Hi Zach (or anyone else who may have a clue), >> >> I'm currently investigating making use of LLVM for PDB parsing for with a >> view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS >> DIA SDK hasn't been updated to handle them. I know this is probably low on >> your priority list but since /DEBUG:FASTLINK is now the implied default for >> VS2017 I figure it's a good time to take a look at it. >> >> Unfortunately I'm finding very little information on the internal >> structure used by partial PDBs. It seems https://github.com/ >> Microsoft/microsoft-pdb doesn't offer much either, unless I'm missing >> something... >> >> So, two questions: Are you planning to try and support partial PDBs? And >> do you have any good references for their layout? >> >> Many thanks, >> Will. >> >>-- *Indefiant *: http://www.indefiant.com Home of Recode : Runtime C++ Editing for VS -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170608/8a2e507d/attachment.html>