thr3ads.net - llvm dev - [llvm-dev] PDB questions [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Zachary Turner via llvm-dev

2018-Aug-31 14:06 UTC

[llvm-dev] PDB questions

For the first and third questions, the easiest thing to do would be run
llvm-pdbutil under a debugger and step through the code. Code that looks
simple and innocuous can often have a lot of stuff hidden behind it. For
example you could step through that loop that iterates the debug
subsections and look at the value of Reader.getOffset() every time, and see
if it matches with your own code (probably it doesn’t). Or you could dump
the entire contents of the C13Substrem and see if the bytes match up
between your own implementation. It looks like you’re reading all 0s, so
maybe you’re just not even reading the right data.

For the second question, unfortunately I don’t know of a better way. If
the/names stream starts with a magic header, maybe you could walk each
stream looking for that. But it maybe possible to have a rare false
positive that way.

BTW, have you considered just using llvm’s library rather than porting it?
It certainly seems like less work
On Thu, Aug 30, 2018 at 11:49 PM Andrew Kelley <superjoe30 at gmail.com>
wrote:
> One more:
>
> 3. In the purpose of mapping source file index to string, I found this
> code:
>
> Expected<codeview::DebugChecksumsSubsectionRef>
> ModuleDebugStreamRef::findChecksumsSubsection() const {
>   codeview::DebugChecksumsSubsectionRef Result;
>   for (const auto &SS : subsections()) {
>     if (SS.kind() != DebugSubsectionKind::FileChecksums)
>       continue;
>
>     if (auto EC = Result.initialize(SS.getRecordData()))
>       return std::move(EC);
>     return Result;
>   }
>   return Result;
> }
>
> Subsections() is populated here:
>
>   if (auto EC = Reader.readSubstream(C13LinesSubstream, C13Size))
>     return EC;
>
>   BinaryStreamReader SymbolReader(SymbolsSubstream.StreamData);
>   if (auto EC >           SymbolReader.readArray(SymbolArray,
> SymbolReader.bytesRemaining()))
>     return EC;
>
>   BinaryStreamReader SubsectionsReader(C13LinesSubstream.StreamData);
>   if (auto EC = SubsectionsReader.readArray(Subsections,
>
> SubsectionsReader.bytesRemaining()))
>     return EC;
>
> So it looks like there should be one of these just after the C13Lines
> substream:
>
> struct DebugSubsectionHeader {
>   support::ulittle32_t Kind;   // codeview::DebugSubsectionKind enum
>   support::ulittle32_t Length; // number of bytes occupied by this record.
> };
>
> But when I look there with my own code I only see zeroes:
>
> read C13 line info 142964 bytes
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
> <repeats until end of stream>
>
> Any clues?
>
>
> On Fri, Aug 31, 2018 at 2:17 AM Andrew Kelley <superjoe30 at
gmail.com>
> wrote:
>
>> Zachary,
>>
>> Thanks for the help on IRC earlier. I've got code that can capture
a
>> stack trace and then discover for each address, its module, function,
>> source index, line, and column.
>>
>> I still have a couple of loose ends though. Do you know what's
going on
>> here?
>>
>> 1. There appears to be 8 bytes before every LineFragmentHeader.
Here's
>> some of my own debug output, which matches llvm-pdbutil's output.
You can
>> see it says "unknown bytes: ...".
>>
>> read C13 line info 136720 bytes
>> unknown bytes: f2 00 00 00  60 00 00 00
>> LineFragmentHeader{ .RelocOffset = 0, .RelocSegment = 5, .Flags
>> LineFlags{ .LF_HaveColumns = true, .unused = 0 }, .CodeSize = 52 }
>> has column: true
>> LineBlockFragmentHeader{ .NameIndex = 0, .NumLines = 6, .BlockSize = 84
}
>> LineNumberEntry{ .Offset = 0, .Flags = 101 } Flags{ .Start = 101, .End
>> 17, .IsStatement = false }
>> <snip some LineNumberEntry's>
>> ColumnNumberEntry{ .StartColumn = 5, .EndColumn = 0 }
>> ColumnNumberEntry{ .StartColumn = 30, .EndColumn = 0 }
>> unknown bytes: f2 00 00 00  f0 00 00 00
>> LineFragmentHeader{ .RelocOffset = 64, .RelocSegment = 5, .Flags
>> LineFlags{ .LF_HaveColumns = true, .unused = 0 }, .CodeSize = 366 }
>> has column: true
>> LineBlockFragmentHeader{ .NameIndex = 8, .NumLines = 18, .BlockSize =
228
>> }
>> LineNumberEntry{ .Offset = 0, .Flags = 53 } Flags{ .Start = 53, .End
>> 20, .IsStatement = false }
>> LineNumberEntry{ .Offset = 20, .Flags = 54 } Flags{ .Start = 54, .End
>> 24, .IsStatement = false }
>> <etc>
>>
>> Do you know what's going on with these 8 bytes? I have scoured
>> llvm-pdbutil's source but I cannot find where these bytes are
coming from.
>>
>> 2. Is there a simpler way to find out which is the /names (string
table)
>> stream index without porting the entire hash table implementation?
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180831/e2effb74/attachment.html>

Andrew Kelley via llvm-dev

2018-Aug-31 16:52 UTC

head link

[llvm-dev] PDB questions

Thanks for the advice. I'll examine llvm-pdbutil's behavior with a
debugger.

On Fri, Aug 31, 2018 at 10:06 AM Zachary Turner <zturner at google.com>
wrote:
> For the first and third questions, the easiest thing to do would be run
> llvm-pdbutil under a debugger and step through the code. Code that looks
> simple and innocuous can often have a lot of stuff hidden behind it. For
> example you could step through that loop that iterates the debug
> subsections and look at the value of Reader.getOffset() every time, and see
> if it matches with your own code (probably it doesn’t). Or you could dump
> the entire contents of the C13Substrem and see if the bytes match up
> between your own implementation. It looks like you’re reading all 0s, so
> maybe you’re just not even reading the right data.
>
> For the second question, unfortunately I don’t know of a better way. If
> the/names stream starts with a magic header, maybe you could walk each
> stream looking for that. But it maybe possible to have a rare false
> positive that way.
>
> BTW, have you considered just using llvm’s library rather than porting it?
> It certainly seems like less work
>
The point of these stack traces is that they go into the userland runtime
code. So if I did this, it would cause my users' programs to depend on
LLVM. I don't think that's right. I already have working stack traces
for
linux and macos that don't depend on any libraries and don't add more
than
~20KB to the runtime size.

> On Thu, Aug 30, 2018 at 11:49 PM Andrew Kelley <superjoe30 at
gmail.com>
> wrote:
>
>> One more:
>>
>> 3. In the purpose of mapping source file index to string, I found this
>> code:
>>
>> Expected<codeview::DebugChecksumsSubsectionRef>
>> ModuleDebugStreamRef::findChecksumsSubsection() const {
>>   codeview::DebugChecksumsSubsectionRef Result;
>>   for (const auto &SS : subsections()) {
>>     if (SS.kind() != DebugSubsectionKind::FileChecksums)
>>       continue;
>>
>>     if (auto EC = Result.initialize(SS.getRecordData()))
>>       return std::move(EC);
>>     return Result;
>>   }
>>   return Result;
>> }
>>
>> Subsections() is populated here:
>>
>>   if (auto EC = Reader.readSubstream(C13LinesSubstream, C13Size))
>>     return EC;
>>
>>   BinaryStreamReader SymbolReader(SymbolsSubstream.StreamData);
>>   if (auto EC >>           SymbolReader.readArray(SymbolArray,
>> SymbolReader.bytesRemaining()))
>>     return EC;
>>
>>   BinaryStreamReader SubsectionsReader(C13LinesSubstream.StreamData);
>>   if (auto EC = SubsectionsReader.readArray(Subsections,
>>
>> SubsectionsReader.bytesRemaining()))
>>     return EC;
>>
>> So it looks like there should be one of these just after the C13Lines
>> substream:
>>
>> struct DebugSubsectionHeader {
>>   support::ulittle32_t Kind;   // codeview::DebugSubsectionKind enum
>>   support::ulittle32_t Length; // number of bytes occupied by this
record.
>> };
>>
>> But when I look there with my own code I only see zeroes:
>>
>> read C13 line info 142964 bytes
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length = 0 }
>> <repeats until end of stream>
>>
>> Any clues?
>>
>>
>> On Fri, Aug 31, 2018 at 2:17 AM Andrew Kelley <superjoe30 at
gmail.com>
>> wrote:
>>
>>> Zachary,
>>>
>>> Thanks for the help on IRC earlier. I've got code that can
capture a
>>> stack trace and then discover for each address, its module,
function,
>>> source index, line, and column.
>>>
>>> I still have a couple of loose ends though. Do you know what's
going on
>>> here?
>>>
>>> 1. There appears to be 8 bytes before every LineFragmentHeader.
Here's
>>> some of my own debug output, which matches llvm-pdbutil's
output. You can
>>> see it says "unknown bytes: ...".
>>>
>>> read C13 line info 136720 bytes
>>> unknown bytes: f2 00 00 00  60 00 00 00
>>> LineFragmentHeader{ .RelocOffset = 0, .RelocSegment = 5, .Flags
>>> LineFlags{ .LF_HaveColumns = true, .unused = 0 }, .CodeSize = 52 }
>>> has column: true
>>> LineBlockFragmentHeader{ .NameIndex = 0, .NumLines = 6, .BlockSize
= 84 }
>>> LineNumberEntry{ .Offset = 0, .Flags = 101 } Flags{ .Start = 101,
.End >>> 17, .IsStatement = false }
>>> <snip some LineNumberEntry's>
>>> ColumnNumberEntry{ .StartColumn = 5, .EndColumn = 0 }
>>> ColumnNumberEntry{ .StartColumn = 30, .EndColumn = 0 }
>>> unknown bytes: f2 00 00 00  f0 00 00 00
>>> LineFragmentHeader{ .RelocOffset = 64, .RelocSegment = 5, .Flags
>>> LineFlags{ .LF_HaveColumns = true, .unused = 0 }, .CodeSize = 366 }
>>> has column: true
>>> LineBlockFragmentHeader{ .NameIndex = 8, .NumLines = 18, .BlockSize
>>> 228 }
>>> LineNumberEntry{ .Offset = 0, .Flags = 53 } Flags{ .Start = 53,
.End >>> 20, .IsStatement = false }
>>> LineNumberEntry{ .Offset = 20, .Flags = 54 } Flags{ .Start = 54,
.End >>> 24, .IsStatement = false }
>>> <etc>
>>>
>>> Do you know what's going on with these 8 bytes? I have scoured
>>> llvm-pdbutil's source but I cannot find where these bytes are
coming from.
>>>
>>> 2. Is there a simpler way to find out which is the /names (string
table)
>>> stream index without porting the entire hash table implementation?
>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180831/56da5891/attachment.html>

Andrew Kelley via llvm-dev

2018-Sep-02 22:34 UTC

head link

[llvm-dev] PDB questions

Thanks again for your help, Zachary.

Here's a screenshot of stack traces working on Windows!
https://i.imgur.com/eOQO0GT.png

I owe you some PDB documentation patches.

On Fri, Aug 31, 2018 at 12:52 PM Andrew Kelley <superjoe30 at gmail.com>
wrote:
> Thanks for the advice. I'll examine llvm-pdbutil's behavior with a
> debugger.
>
> On Fri, Aug 31, 2018 at 10:06 AM Zachary Turner <zturner at
google.com>
> wrote:
>
>> For the first and third questions, the easiest thing to do would be run
>> llvm-pdbutil under a debugger and step through the code. Code that
looks
>> simple and innocuous can often have a lot of stuff hidden behind it.
For
>> example you could step through that loop that iterates the debug
>> subsections and look at the value of Reader.getOffset() every time, and
see
>> if it matches with your own code (probably it doesn’t). Or you could
dump
>> the entire contents of the C13Substrem and see if the bytes match up
>> between your own implementation. It looks like you’re reading all 0s,
so
>> maybe you’re just not even reading the right data.
>>
>> For the second question, unfortunately I don’t know of a better way. If
>> the/names stream starts with a magic header, maybe you could walk each
>> stream looking for that. But it maybe possible to have a rare false
>> positive that way.
>>
>> BTW, have you considered just using llvm’s library rather than porting
>> it? It certainly seems like less work
>>
>
> The point of these stack traces is that they go into the userland runtime
> code. So if I did this, it would cause my users' programs to depend on
> LLVM. I don't think that's right. I already have working stack
traces for
> linux and macos that don't depend on any libraries and don't add
more than
> ~20KB to the runtime size.
>
>
>> On Thu, Aug 30, 2018 at 11:49 PM Andrew Kelley <superjoe30 at
gmail.com>
>> wrote:
>>
>>> One more:
>>>
>>> 3. In the purpose of mapping source file index to string, I found
this
>>> code:
>>>
>>> Expected<codeview::DebugChecksumsSubsectionRef>
>>> ModuleDebugStreamRef::findChecksumsSubsection() const {
>>>   codeview::DebugChecksumsSubsectionRef Result;
>>>   for (const auto &SS : subsections()) {
>>>     if (SS.kind() != DebugSubsectionKind::FileChecksums)
>>>       continue;
>>>
>>>     if (auto EC = Result.initialize(SS.getRecordData()))
>>>       return std::move(EC);
>>>     return Result;
>>>   }
>>>   return Result;
>>> }
>>>
>>> Subsections() is populated here:
>>>
>>>   if (auto EC = Reader.readSubstream(C13LinesSubstream, C13Size))
>>>     return EC;
>>>
>>>   BinaryStreamReader SymbolReader(SymbolsSubstream.StreamData);
>>>   if (auto EC >>>          
SymbolReader.readArray(SymbolArray,
>>> SymbolReader.bytesRemaining()))
>>>     return EC;
>>>
>>>   BinaryStreamReader
SubsectionsReader(C13LinesSubstream.StreamData);
>>>   if (auto EC = SubsectionsReader.readArray(Subsections,
>>>
>>> SubsectionsReader.bytesRemaining()))
>>>     return EC;
>>>
>>> So it looks like there should be one of these just after the
C13Lines
>>> substream:
>>>
>>> struct DebugSubsectionHeader {
>>>   support::ulittle32_t Kind;   // codeview::DebugSubsectionKind
enum
>>>   support::ulittle32_t Length; // number of bytes occupied by this
>>> record.
>>> };
>>>
>>> But when I look there with my own code I only see zeroes:
>>>
>>> read C13 line info 142964 bytes
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> DebugSubsectionHeader{ .Kind = DebugSubsectionKind.None, .Length =
0 }
>>> <repeats until end of stream>
>>>
>>> Any clues?
>>>
>>>
>>> On Fri, Aug 31, 2018 at 2:17 AM Andrew Kelley <superjoe30 at
gmail.com>
>>> wrote:
>>>
>>>> Zachary,
>>>>
>>>> Thanks for the help on IRC earlier. I've got code that can
capture a
>>>> stack trace and then discover for each address, its module,
function,
>>>> source index, line, and column.
>>>>
>>>> I still have a couple of loose ends though. Do you know
what's going on
>>>> here?
>>>>
>>>> 1. There appears to be 8 bytes before every LineFragmentHeader.
Here's
>>>> some of my own debug output, which matches llvm-pdbutil's
output. You can
>>>> see it says "unknown bytes: ...".
>>>>
>>>> read C13 line info 136720 bytes
>>>> unknown bytes: f2 00 00 00  60 00 00 00
>>>> LineFragmentHeader{ .RelocOffset = 0, .RelocSegment = 5, .Flags
>>>> LineFlags{ .LF_HaveColumns = true, .unused = 0 }, .CodeSize =
52 }
>>>> has column: true
>>>> LineBlockFragmentHeader{ .NameIndex = 0, .NumLines = 6,
.BlockSize = 84
>>>> }
>>>> LineNumberEntry{ .Offset = 0, .Flags = 101 } Flags{ .Start =
101, .End
>>>> = 17, .IsStatement = false }
>>>> <snip some LineNumberEntry's>
>>>> ColumnNumberEntry{ .StartColumn = 5, .EndColumn = 0 }
>>>> ColumnNumberEntry{ .StartColumn = 30, .EndColumn = 0 }
>>>> unknown bytes: f2 00 00 00  f0 00 00 00
>>>> LineFragmentHeader{ .RelocOffset = 64, .RelocSegment = 5,
.Flags >>>> LineFlags{ .LF_HaveColumns = true, .unused = 0 },
.CodeSize = 366 }
>>>> has column: true
>>>> LineBlockFragmentHeader{ .NameIndex = 8, .NumLines = 18,
.BlockSize >>>> 228 }
>>>> LineNumberEntry{ .Offset = 0, .Flags = 53 } Flags{ .Start = 53,
.End >>>> 20, .IsStatement = false }
>>>> LineNumberEntry{ .Offset = 20, .Flags = 54 } Flags{ .Start =
54, .End >>>> 24, .IsStatement = false }
>>>> <etc>
>>>>
>>>> Do you know what's going on with these 8 bytes? I have
scoured
>>>> llvm-pdbutil's source but I cannot find where these bytes
are coming from.
>>>>
>>>> 2. Is there a simpler way to find out which is the /names
(string
>>>> table) stream index without porting the entire hash table
implementation?
>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180902/4a8fc3fd/attachment.html>

llvm dev - Aug 2018 - PDB questions

[llvm-dev] PDB questions

[llvm-dev] PDB questions

[llvm-dev] PDB questions