thr3ads.net - llvm dev - [llvm-dev] [llvm-pdbutil] : merge not working properly [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Vivien Millet via llvm-dev

2019-Jan-28 19:22 UTC

[llvm-dev] [llvm-pdbutil] : merge not working properly

Hello Zachary,
Sorry for replying so lately but It's been a week I'm thinking an
working
hard on your  "dll memory buffer"  idea to see if it works and give
you
feedbacks !
And it works pretty well until now :
I shared on the list what I did :
- create a .ASM file full of "int 3" instructions (to ensure that if
we
execute over the boundaries we instantly break.
- Compile this to a .DLL
- use hexadecimal editor to change ".text" section Characteristics
from
Read/Execute to Read/Write/Execute
- run my program which does JIT compilation
- get the start RVA of the .text section (which is always 0x1000 in my case)
- Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a
custom DllMemMgr I give to MCJIT
- On NotifyObjectEmitted replace the dll pdb by a custom one I build myself
with your PDBFileBuilder
- On finalizing memory, reload first the dll to trigger visual studio pdb
reloading (not working don't know why yet), ensure it goes into the same
virtual space, protect memory using VirtualProtect.
- Place a breakpoint in my JIT file, it displays "loaded", execute
JIT, it
breaks
...and ....
* drums *
Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc ...
and this, every time, I don't know why..
I noticed, when compiling C++ equivalent to my JIT program, that a simple
"int param" is written size=20 in C++ pdb and size=16 in my JIT pdb,
do you
know what this "size" attribute represent in the S_LOCAL Symbol
section ? I
suspect the symbol section to have program for the watch issue .. but I am
not sure, If you have an idea...
I also had an "illegal instruction" exception when stepping with F10
after
break, but when I'm not breaking the code it runs well..

A lot of mysteries there again...

Visual studio displays well the disassembly with the debug lines at the
right place, etc .. so I don't get why visual studio crashes..
Another issue I have is that I always have to remove/add my breakpoint so
that visual studio realy breaks, even if it says "I'm a good breakpoint
at
that good address". Does it have a relation with file checksums ? It seems
mine has a "none" checksum so I suspect this to be the problem.. but I
don't know how to fix it because I added the checksum with addChecksum with
the good file name and still I get "none" in the dump...
So right know I'm quite hopeful because I get something reacting in Visual
studio, but I have no idea why it crashes..

Have you already encountered this issue when testing your generated pdbs ?
Do you know the role of Section Contributions in the PDB/debugging session
?
Any tip for checking Symbol record validity in the dump ? looks good to me,
no ??? anywhere or Error ..

Thank you !





Le mer. 23 janv. 2019 à 22:29, Zachary Turner <zturner at google.com> a
écrit :
> .text is where code goes, I don't know why it's called .text,
it's just
> been that way for many decades and the name stuck around.  But actually you
> can call the section whatever you want.  Maybe it's even better to call
it
> something other than .text, because .text is where your DllMain and other
> stuff will be.  You could call it .jit if you wanted to.  You should be
> able to create the section with whatever flags you want to.  You'll
need to
> produce a jit_code.obj probably compiled from assembly that makes a section
> named .jit and sets the flags to be executable (you can just copy the flags
> from a normal .text section of some other program).  Then link this file
> together along with a jitted_code_main.obj which you compiled from a simple
> source file with a DllMain function that does nothing.  This would make
> jitted_code.dll, then have your program link against jitted_code.lib.
>
> Right now you jit the code into some buffer that you created with
> VirtualAlloc.  If you do the above, it will load jitted_code.dll into
> memory and the OS loader will allocate some memory for each section.  So
> this would be like your VirtualAlloc, you can just find the address of the
> .jit section and use that buffer instead of the VirtualAlloc buffer as the
> target address of your jit operations.
>
> Again, this is just an idea, no promises it will work, but unfortunately
> that's kind of the best you can do when dealing with closed source
things,
> just make guesses and hope for the best.
>
>
>
> On Wed, Jan 23, 2019 at 12:42 PM Vivien Millet <vivien.millet at
gmail.com>
> wrote:
>
>> (Yes you are right this is my fault)
>> Considering the string table, it only seems to contains file relative
>> informations in every pdb I am using, and it looks correct but I will
check
>> it.
>> I looked at the pdb.cpp code about checksums and tables, I copied some
>> stuff and got things wrong according to cvdump, then I simplified the
>> process of copying the table and it worked (in cvdump it finds the file
>> matching line etc...) so I suspect this is also correct.
>>
>> All the streams look good, but I will check deeper !
>>
>> It seems right what you say about rva and modules, this is what I m
>> afraid of, doing all of this for nothing or almost..
>>
>> Your idea looks good concerning the .text section in a separated dll,
but
>> will it be executable memory ? .text is where static strings go right ?
>> When you say putting my jit in there, do you mean writing it when the
>> jitted_code.dll is loaded in memory or on the .dll file directly before
>> loading it ? In the first scenario I wonder if the section will be
>> executable, in the second scenario I can’t do it because it would
require
>> perfect linking with the other code my jit points to..
>>
>> Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at
google.com> a
>> écrit :
>>
>>> (BTW, I'm adding llvm-dev back to the list, I didn't notice
it got taken
>>> off.  In general I try to keep the list on all emails, even if
it's
>>> extremely technical and specific, because someday someone else will
try to
>>> do this, and it'll be nice if they can read the whole thread).
>>>
>>> I can think of a couple of things that might be wrong:
>>>
>>> 1) If the string table is in a different order, then anything that
>>> refers to the string table need to be changed to refer to the new
offset.
>>> If the string "foo" is at offset 12 in the old PDB, but
offset 15 in the
>>> new PDB, then somewhere there is a record which is going to look at
offset
>>> 12 and expect to find something, and that will mess up.  The main
place
>>> this is important is in the File Checksums table, there is an entry
that
>>> says which file it is a checksum for, and that refers to the string
table.
>>> However, it's possible for certain symbol records to refer to
the string
>>> table too.  See lld/COFF/PDB.cpp and Ctrl+F for
"PDBStrTab" and you will
>>> find some information about this.
>>>
>>> 2) When you run `llvm-pdbutil dump -streams` on the copied PDB, do
all
>>> of them show a reasonable description?  Are there any streams that
say
>>> (???)?  If so, that's a problem.
>>>
>>> > does visual studio will consider a symbol file broken if the
address
>>> goes beyond the official module address range (the compiled one),
because
>>> my JIT code is allocated after the end of the module with
VirtualAlloc
>>> That is a good question, and part of why my job is so difficult,
because
>>> I can't look at their code.  But I think the answer is
"probably".  The
>>> debugger has to have some way to convert an address in your running
process
>>> into a symbol and offset, because that's how all debug info is
represented
>>> in the PDB.  So if there is no module, then there is no RVA
(because the R
>>> in RVA means relative, and what would it be relative to?).
>>>
>>> One idea to test this would be to create a DLL called
jitted_code.dll,
>>> give it a huuuuuge .text section (probably just a .asm file and use
some
>>> assembly directives to allocate a very large series of null bytes),
and
>>> then write your jit code into that area.  This way you would not
need to
>>> modify the existing PDB you would only need to make a new PDB
called
>>> jitted_code.pdb with 1 module, and those symbols could have
meaningful
>>> RVAs.  And you might not even need to detach the debugger if you do
things
>>> this way, because you could just right click the jitted_code.dll
module in
>>> the modules window and choose Load Symbols.
>>>
>>>
>>>
>>> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet <vivien.millet at
gmail.com>
>>> wrote:
>>>
>>>> Yes this is it, I just make a copy from a pdb generated by
link.exe
>>>> (the microsoft one).
>>>> Using llvm-pdbutil to compare is what I do, except I do it with
"-all"
>>>> And I get almost everything the same : same number of streams,
section
>>>> map looks good,string table looks good (even if not the same
order), same
>>>> number of modules with the symbols and subsection practically
the same, and
>>>> this is why I get stuck, I miss something but I can't see
what because I
>>>> don't know where to look for. Visual studio works with it,
I can debug my
>>>> original exe, but probably without the globals...
>>>> And the other problem is that the difference between the dumps
is not
>>>> necessarily a bug because the builder may generate new hashes
values,
>>>> reorder streams, modules, etc ...
>>>>
>>>> Right now I gave up to have publics and globals streams and
attacked
>>>> the real goal : insert my jit codeview into the pdb. I have
again done «
>>>> something » but as I don’t understand how the format work I
don’t have it
>>>> working in visual studio.. except once, a single time it worked
and the
>>>> breakpoint turned on in the UI (even if the rva was broken for
the
>>>> instructions) but it happened a single time .. then I get
depressed the
>>>> next times..... cvdump displays it all « correct », no corrupt
stuff
>>>> apparently. But what I do is probably wrong somewhere. What I
do is I take
>>>> .debug$S and .debug$T as is without relocations just to see,
but what I
>>>> don’t know really is : does visual studio will consider a
symbol file
>>>> broken if the address goes beyond the official module address
range (the
>>>> compiled one), because my JIT code is allocated after the end
of the module
>>>> with VirtualAlloc.
>>>> Another thing I don’t get is the section contribution, what is
it
>>>> exactly ? I inserted section contrib for all sections except
the debug$
>>>> ones but I don’t know what i’m really doing and it’s my average
problem
>>>> implementing this JIT feature...
>>>> I also don’t know what are relocations inside the codeview
format, what
>>>> is the difference between RVA and relocation, is there anything
to do with
>>>> this related to the codeview part I need to insert in the pdb ?
I don’t see
>>>> why visual studio needs more than just RVA<->Line
mapping..
>>>> This is really making me crazy being so ignorant and trying to
guess
>>>> what visual studio does...
>>>>
>>>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner at
google.com> a
>>>> écrit :
>>>>
>>>>> So if i understand correctly, you're basically just
trying to
>>>>> implement something like a pdb *copy*, just as a test to
see if you can get
>>>>> it to work.  So you generate a PDB with cl/link or
clang-cl/lld-link, then
>>>>> try to copy it using your tool, then see if it still works.
>>>>>
>>>>> If this is correct, and it's not working, then there is
probably just
>>>>> something you didn't copy.  Neither Publics nor globals
actually contain
>>>>> their own data, instead they just refer to records from the
corresponding
>>>>> module stream.  So an S_PROCREF for the function
"main" might have fields
>>>>> that say "the name of the function is main, and
it's at offset 20 of module
>>>>> 1".  So, if there is no module 1, or if offset 20 of
module is not actually
>>>>> an S_GPROC32 for the function main, then it will be broken.
>>>>>
>>>>> Did you also go through each module in the source PDB, add
a new
>>>>> module in the target PDB, then copy all of the symbols for
each one?
>>>>>
>>>>> the best way to find differences is by using llvm-pdbutil
on the
>>>>> source and target PDBs and looking for things that look
different.  For
>>>>> example, I'd start with llvm-pdbutil dump -streams and
then seeing if they
>>>>> even have all the same streams.  If one of them is missing
streams, that's
>>>>> a good place to start.  If they have the same streams, then
look for ones
>>>>> where the size is different.  Then drill into those to see
why the size is
>>>>> different.
>>>>>
>>>>> LMK if that helps.
>>>>>
>>>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <
>>>>> vivien.millet at gmail.com> wrote:
>>>>>
>>>>>> For now I'm not merging my JIT CodeView section, I
only try to build
>>>>>> a pure copy of an existing PDB using the XxxBuilder
classes (PDBFileBuilder
>>>>>> & Co / reading a PDBFile) and check if visual
studio wants to eat it..
>>>>>> For Publics and Globals, what I do is naive, I use the
>>>>>> GsiStreamBuilder and prey :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   if (File.hasPDBGlobalsStream() &&
File.getPDBGlobalsStream()) {
>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>     GlobalsStream &stream =
*File.getPDBGlobalsStream();
>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>
>>>>>>     for (uint32_t PubSymOff : stream.getGlobalsTable())
{
>>>>>>       CVSymbol Sym =
SymbolRecords.readRecord(PubSymOff);
>>>>>>       builder.addGlobalSymbol(Sym);
>>>>>>     }
>>>>>>   }
>>>>>>   if (File.hasPDBPublicsStream() &&
File.getPDBPublicsStream()) {
>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>     PublicsStream &stream =
*File.getPDBPublicsStream();
>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>
>>>>>> 	std::vector<PublicSym32> Publics;
>>>>>>
>>>>>>     for (uint32_t PubSymOff : stream.getPublicsTable())
{
>>>>>>       PublicSym32 Pub = cantFail(
>>>>>>          
llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>>>               SymbolRecords.readRecord(PubSymOff)));
>>>>>>       Publics.push_back(Pub);
>>>>>>     }
>>>>>>
>>>>>>     if (!Publics.empty()) {
>>>>>>       // Sort the public symbols and add them to the
stream.
>>>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>>>            [](const PublicSym32 &L, const
PublicSym32 &R) {
>>>>>>              return L.Name < R.Name;
>>>>>>            });
>>>>>>       for (const PublicSym32 &Pub : Publics)
>>>>>>         builder.addPublicSymbol(Pub);
>>>>>>     }
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> Is it what you meant ?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner
<zturner at google.com> a
>>>>>> écrit :
>>>>>>
>>>>>>> Also, even if symbolGoesInGlobalsStream returns
true, you can’t just
>>>>>>> copy it. Functions, for example, which are
S_GPROC32 or S_LPROC32 in the
>>>>>>> module stream, are S_PROCREF in the globals stream.
Similarly, *everything*
>>>>>>> in the publics stream is S_PUB32. So you need to
convert each symbol to the
>>>>>>> proper type for the stream it’s going to go in
>>>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner
<zturner at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Publics are basically a list of everything that
has a mangled name.
>>>>>>>> To be honest, I don’t know what the debugger
uses this for.
>>>>>>>>
>>>>>>>> Globals is essentially every symbol in the pdb
in one large table.
>>>>>>>> The reason this is important is because if you
type “foo” in the watch
>>>>>>>> window, the debugger doesn’t necessarily know
what compiland foo comes
>>>>>>>> from. So it has to have a way to find
everything in the entire program no
>>>>>>>> matter what compiland it came from. That’s what
the globals are.
>>>>>>>>
>>>>>>>> Both publics and globals are hash tables, so
one possible reason
>>>>>>>> there might be a problem is that you need to
rehash the entire table. When
>>>>>>>> you build your modified pdb, I would suggest
starting with an empty publics
>>>>>>>> / globals stream, adding all items from the
first pdb by iterating over
>>>>>>>> those records and using a GlobalsStreamBuilder,
then adding all your jitted
>>>>>>>> items separately, then writing it out. That
should make sure it gets hashed
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> Are you doing that?
>>>>>>>>
>>>>>>>> Btw, not all symbols belong in the globals /
publics stream. Check
>>>>>>>> the code in lld and search for
symbolGoesInGlobalsStream and
>>>>>>>> symbolGoesInPublicsStream to see the logic it
uses
>>>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien Millet
<
>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Zachary, sorry for disturbing again..
>>>>>>>>>
>>>>>>>>> I've fixed some problems (StringTable,
SectionMap and few things
>>>>>>>>> here and there..) and my converted PDB
seems now to work inside visual
>>>>>>>>> studio..
>>>>>>>>> But I'm not sure if I have full debug
features because I don't
>>>>>>>>> succeed to translate Publics and Globals
correctly. CVDump says PDB is
>>>>>>>>> corrupted whereas PDBUTIL -dump correctly
displays them.
>>>>>>>>> I don't really understand what Publics
and Globals stream really
>>>>>>>>> are, if the symbols are really in the
corresponding streams or if they are
>>>>>>>>> just references to somewhere else.
>>>>>>>>> The LLVM documentation is not complete
about these two Publics and
>>>>>>>>> Globals stream so I'm a bit lost on how
to handle them or find what is
>>>>>>>>> "corrupted" according to CVDump.
>>>>>>>>> I took example on LLD and yaml2pdb to help
me to do some tough
>>>>>>>>> conversions but I noticed that in yaml2pdb
there is no GsiStream exported
>>>>>>>>> (no GsiBuidler use and no reference to
Publics or Globals anywhere), is it
>>>>>>>>> wanted/correct ?
>>>>>>>>> Thanks and sorry If I'm a bit spaming,
it's my 99% time task right
>>>>>>>>> now and being stuck without any clue is
difficult :) But I guess you
>>>>>>>>> experienced even more suffering when
documentation didn't exist at all !
>>>>>>>>> Have a good day !
>>>>>>>>>
>>>>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien
Millet <
>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>
>>>>>>>>>> ERRATUM, my bad, the pdb I tested is
also corrupted according to
>>>>>>>>>> cvdump.exe, I on't know why, I
regenerated again and now I have a working
>>>>>>>>>> dump. You don't need to fix
anything.
>>>>>>>>>>
>>>>>>>>>> Le dim. 20 janv. 2019 à 20:26, Vivien
Millet <
>>>>>>>>>> vivien.millet at gmail.com> a écrit
:
>>>>>>>>>>
>>>>>>>>>>> Hi Zachary,
>>>>>>>>>>> I've done a first step to
rewrite  existing PDBFile with
>>>>>>>>>>> PDBFileBuilder, I get mostly of the
work done but I don't get as much
>>>>>>>>>>> output as input (some streams are
not mirrored for unknown reasons and some
>>>>>>>>>>> data must be missing here and
there...).
>>>>>>>>>>> When I try to replace the original
by the rebuilt one for
>>>>>>>>>>> debugging, the pdb loads well but
breakpoints failed to activate with a
>>>>>>>>>>> "unexpected symbol reader
error while processing foobar.exe". You probably
>>>>>>>>>>> know what it means or already
encountered this error I guess.
>>>>>>>>>>> I also tried to create a minimal
program to simplify comparisons
>>>>>>>>>>> between original and new PDB but I
get an error dumping the original  pdb
>>>>>>>>>>> exported by visual studio  with
-all (PublicsStream.cpp|98). I think it is
>>>>>>>>>>> a bug.
>>>>>>>>>>> I've attached the related
main.cpp and PDB to this email if you
>>>>>>>>>>> want to check what is the error
exactly (vs2017, x86 and x64 have same
>>>>>>>>>>> issues).
>>>>>>>>>>> I've attached also my code (git
diff). I added an « identity »
>>>>>>>>>>> feature to pdbutil which uses the
code I wrote to regenerate the input pdb.
>>>>>>>>>>> You can use it to see what I get so
far..
>>>>>>>>>>> I’ve seen you added recently a fix
related to FPO but you say
>>>>>>>>>>> it’s only for x86 so I don’t think
it would change something but who knows..
>>>>>>>>>>> Anyway, if you have a moment to
check my work so far and give me
>>>>>>>>>>> feedbacks it’s welcome because I
get out of ideas about what goes wrong..
>>>>>>>>>>> Thanks, I go back digging into the
pdb mysteries !
>>>>>>>>>>>
>>>>>>>>>>> Le ven. 18 janv. 2019 à 12:31,
Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> a
écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Ok ! It was just to be sure I
understood well.
>>>>>>>>>>>> Sorry for not replying
directly, I wanted to try first to emit
>>>>>>>>>>>> CodeView before continuing the
discussion and it was time for me to go to
>>>>>>>>>>>> bed here..
>>>>>>>>>>>> I just tried it now and it is
very easy to switch to CodeView.
>>>>>>>>>>>> For the ones interested : you
just have to give your TargetTriple to your
>>>>>>>>>>>> llvm::Module used for JIT and
then call
>>>>>>>>>>>>
module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell
the
>>>>>>>>>>>> AsmPrinter this module prefer
CodeView instead of Dwarf.
>>>>>>>>>>>> I've checked the content of
my .obj file, and there is valid
>>>>>>>>>>>> .debug$T and  .debug$S
sections, so everything goes well until now.
>>>>>>>>>>>> Now as a parallel task I will
try to read the EXE PDB and
>>>>>>>>>>>> re-export it "as it"
to see if I break something in visual studio.
>>>>>>>>>>>> If I succeed to do that, that
might be added as a feature to
>>>>>>>>>>>> PDBFile or PDBFileBuilder to
simplify the process for other users.
>>>>>>>>>>>> I keep you in touch.
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:50,
Zachary Turner <
>>>>>>>>>>>> zturner at google.com> a
écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> When I say "nothing to
do" I just mean that you won't have to
>>>>>>>>>>>>> do anything to convert the
record from one format (DWARF) to another format
>>>>>>>>>>>>> (CodeView).  You will have
a COFF object file either on disk (probably
>>>>>>>>>>>>> named foo.obj or something)
or in memory.  And this object file will have a
>>>>>>>>>>>>> .debug$S section with
CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>>>>> types.  Then you will still
need to use the PDBFileBuilder to add these
>>>>>>>>>>>>> records to the final PDB,
but they will already be in the correct format
>>>>>>>>>>>>> that PDBFileBuilder
expects, you won't need to convert them from DWARF
>>>>>>>>>>>>> (which is not trivial).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 17, 2019 at
11:26 AM Vivien Millet <
>>>>>>>>>>>>> vivien.millet at
gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That’s a good question,
by default when emitting the object
>>>>>>>>>>>>>> file I choose COFF but
it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>>>> probably is a way to do
it or at least it must be implemented if not yet..
>>>>>>>>>>>>>> Lets imagine I manage
to do that.. when you say there is
>>>>>>>>>>>>>> nothing to do, I still
must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>>>> inside the EXE PDB
right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>>>> another way ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à
20:01, Zachary Turner <
>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, is it
possible to just hook up the CodeView debug info
>>>>>>>>>>>>>>> generator to MCJIT?
If you're not jitting, and you just compile something,
>>>>>>>>>>>>>>> we translate all of
the LLVM metadata into CodeView in the file
>>>>>>>>>>>>>>> CodeViewDebug.cpp. 
Then, the object file just already has CodeView in it.
>>>>>>>>>>>>>>> If it's not
hard to do, this would probably be a better solution, because
>>>>>>>>>>>>>>> you don't have
to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>>>> is not always
trivial.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you can
configure this in MCJIT, you won't even need to
>>>>>>>>>>>>>>> do anything, you
can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>>>> .debug$S sections,
iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>>>> while copying them
to the output PDB file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 17,
2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>>>> vivien.millet at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok I understand
more what you meant. In fact I don’t care
>>>>>>>>>>>>>>>> about the pdb
size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>>>> me to have
duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>>>> is not to
generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>>>> extract debug
info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>>>> is emitted by
the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>>>> PDBFileBuilder.
Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>>>> I think that
could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>>>> MCJit if not
being an extension to pdbutil.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le jeu. 17
janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, for
example the TPI stream is just one big
>>>>>>>>>>>>>>>>> collection
of types.  Presumably your JIT code will reuse some of the same
>>>>>>>>>>>>>>>>> types
(perhaps, std::string for example) as your non-jitted code.  Your
>>>>>>>>>>>>>>>>> jitted
symbol records in the object file (for example, a local variable of
>>>>>>>>>>>>>>>>> type
std::string in your jitted code) will refer to the type for
>>>>>>>>>>>>>>>>> std;:string
by a TypeIndex, and your original PDB will also refer to
>>>>>>>>>>>>>>>>> std::string
by a different TypeIndex.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In LLD,
when we merge in types and symbols from each
>>>>>>>>>>>>>>>>> object
file, we keep a hash table of which types have already been seen, so
>>>>>>>>>>>>>>>>> that if we
see the same type again, we can just use the TypeIndex that we
>>>>>>>>>>>>>>>>> wrote on a
previous object file.  Then, when we add symbol records, we have
>>>>>>>>>>>>>>>>> to update
its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
De-duplicating though, I suppose, is not strictly
>>>>>>>>>>>>>>>>> necessary,
it will just keep your PDB size down.  But you *will* need to at
>>>>>>>>>>>>>>>>> least
re-write the TypeIndexes from the jitted code.  For example, you may
>>>>>>>>>>>>>>>>> decide that
instead of de-duplicating, you just append them all to the end
>>>>>>>>>>>>>>>>> of the TPI
stream (where all the types go in PDB) to keep things simple.
>>>>>>>>>>>>>>>>> Since they
were in a different position before, they now have different
>>>>>>>>>>>>>>>>>
TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>>>> correct
after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>>>> you will
need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>>>> the symbols
of the jitted code.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know
if that makes sense.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan
17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ok I
see..
>>>>>>>>>>>>>>>>>> what do
you mean by “making sure to de-duplicate records
>>>>>>>>>>>>>>>>>> as
necessary” ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Le jeu.
17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>>>> zturner
at google.com> a écrit :
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
It's possible in theory to support incremental updates
>>>>>>>>>>>>>>>>>>> to
a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>>>>
this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>>>>
support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
"dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>>>>
build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>>>>
de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>>>>
most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>>>>
writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>>>>
advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Hi Zachary !
>>>>>>>>>>>>>>>>>>>>
If there a way to easily create a new PDBFileBuilder
>>>>>>>>>>>>>>>>>>>>
from an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>>>>
I would like to start from a builder filled with the
>>>>>>>>>>>>>>>>>>>>
EXE PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thanks !
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thank you Zachary !
>>>>>>>>>>>>>>>>>>>>>
I will have some soon I think ..
>>>>>>>>>>>>>>>>>>>>>
I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>>>>
because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>>>>>
questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>>>>>
enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>>>>>
still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>>>>>
start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>>>>>
working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>>>>>
and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>>>>
will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>>>>
writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>>>>
When you talk about doing all of this I suppose
>>>>>>>>>>>>>>>>>>>>>>>>>
you think about using llvm/debuginfo/pdb, pick code here and there to
>>>>>>>>>>>>>>>>>>>>>>>>>
generate the pdb in memory, read the executable one and perform the merge
>>>>>>>>>>>>>>>>>>>>>>>>>
directly in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>>>>
So you are one of the happy guys who suffered
>>>>>>>>>>>>>>>>>>>>>>>>>>>
from the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>>>>
stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>>>>
when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>>>>
information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>>>>
(what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Merge my JIT pdb into a copy of the executable
>>>>>>>>>>>>>>>>>>>>>>>>>>>
pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>>>>
(creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
- On each JIT rebuild, restart these steps from
>>>>>>>>>>>>>>>>>>>>>>>>>>>
the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>
multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>>>>
think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>>>>
symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>>>>
that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>>>>
go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
So, here are the things I think you would need to
>>>>>>>>>>>>>>>>>>>>>>>>>>
do:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>>>>>
unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>>>>
you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>>>>
there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>>>>
On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>>>>
have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>>>>
right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>
anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>>>>
merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>>>>
the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>>>>
indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>>>>
you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>>>>
(lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>>>>
section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>>>>
file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>>>>
do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>>>>
shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>
PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
You're kind of in uncharted territory here, so
>>>>>>>>>>>>>>>>>>>>>>>>>>
this is just a rough idea of what needs to be done.  There may be other
>>>>>>>>>>>>>>>>>>>>>>>>>>
issues that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Unfortunately I don't personally have the time to
>>>>>>>>>>>>>>>>>>>>>>>>>>
work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>>>>
questions or problems along the way.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190128/d1b93efc/attachment-0001.html>

Vivien Millet via llvm-dev

2019-Jan-28 19:48 UTC

head link

[llvm-dev] [llvm-pdbutil] : merge not working properly

To be more precise on the crash context, it only happens if I write in the
"Watch" window a variable with a name unknown from the local context,
for
example I type "foobar" which does not exist in my program, and then
Visual
Studio freezes (the cursor busy), then it crashes/closes and relaunch as
usual. I suspect the Global/Public stream stuff in this case to be wrong,
or at least a problem in my symbol record but my method parameter displays
well in the "Watch".. If you have an idea..
Could it be a mangling problem in my symbol records ? I don't use C++
mangling, then maybe parsing my symbols can generate bugs..
Is there a C++ mangler in LLVM I can use to produce correct names ?

Le lun. 28 janv. 2019 à 20:22, Vivien Millet <vivien.millet at gmail.com>
a
écrit :
> Hello Zachary,
> Sorry for replying so lately but It's been a week I'm thinking an
working
> hard on your  "dll memory buffer"  idea to see if it works and
give you
> feedbacks !
> And it works pretty well until now :
> I shared on the list what I did :
> - create a .ASM file full of "int 3" instructions (to ensure that
if we
> execute over the boundaries we instantly break.
> - Compile this to a .DLL
> - use hexadecimal editor to change ".text" section
Characteristics from
> Read/Execute to Read/Write/Execute
> - run my program which does JIT compilation
> - get the start RVA of the .text section (which is always 0x1000 in my
> case)
> - Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a
> custom DllMemMgr I give to MCJIT
> - On NotifyObjectEmitted replace the dll pdb by a custom one I build
> myself with your PDBFileBuilder
> - On finalizing memory, reload first the dll to trigger visual studio pdb
> reloading (not working don't know why yet), ensure it goes into the
same
> virtual space, protect memory using VirtualProtect.
> - Place a breakpoint in my JIT file, it displays "loaded",
execute JIT, it
> breaks
> ...and ....
> * drums *
> Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc ...
> and this, every time, I don't know why..
> I noticed, when compiling C++ equivalent to my JIT program, that a simple
> "int param" is written size=20 in C++ pdb and size=16 in my JIT
pdb, do you
> know what this "size" attribute represent in the S_LOCAL Symbol
section ? I
> suspect the symbol section to have program for the watch issue .. but I am
> not sure, If you have an idea...
> I also had an "illegal instruction" exception when stepping with
F10 after
> break, but when I'm not breaking the code it runs well..
>
> A lot of mysteries there again...
>
> Visual studio displays well the disassembly with the debug lines at the
> right place, etc .. so I don't get why visual studio crashes..
> Another issue I have is that I always have to remove/add my breakpoint so
> that visual studio realy breaks, even if it says "I'm a good
breakpoint at
> that good address". Does it have a relation with file checksums ? It
seems
> mine has a "none" checksum so I suspect this to be the problem..
but I
> don't know how to fix it because I added the checksum with addChecksum
with
> the good file name and still I get "none" in the dump...
> So right know I'm quite hopeful because I get something reacting in
Visual
> studio, but I have no idea why it crashes..
>
> Have you already encountered this issue when testing your generated pdbs
> ?
> Do you know the role of Section Contributions in the PDB/debugging session
> ?
> Any tip for checking Symbol record validity in the dump ? looks good to
> me, no ??? anywhere or Error ..
>
> Thank you !
>
>
>
>
>
> Le mer. 23 janv. 2019 à 22:29, Zachary Turner <zturner at google.com>
a
> écrit :
>
>> .text is where code goes, I don't know why it's called .text,
it's just
>> been that way for many decades and the name stuck around.  But actually
you
>> can call the section whatever you want.  Maybe it's even better to
call it
>> something other than .text, because .text is where your DllMain and
other
>> stuff will be.  You could call it .jit if you wanted to.  You should be
>> able to create the section with whatever flags you want to.  You'll
need to
>> produce a jit_code.obj probably compiled from assembly that makes a
section
>> named .jit and sets the flags to be executable (you can just copy the
flags
>> from a normal .text section of some other program).  Then link this
file
>> together along with a jitted_code_main.obj which you compiled from a
simple
>> source file with a DllMain function that does nothing.  This would make
>> jitted_code.dll, then have your program link against jitted_code.lib.
>>
>> Right now you jit the code into some buffer that you created with
>> VirtualAlloc.  If you do the above, it will load jitted_code.dll into
>> memory and the OS loader will allocate some memory for each section. 
So
>> this would be like your VirtualAlloc, you can just find the address of
the
>> .jit section and use that buffer instead of the VirtualAlloc buffer as
the
>> target address of your jit operations.
>>
>> Again, this is just an idea, no promises it will work, but
unfortunately
>> that's kind of the best you can do when dealing with closed source
things,
>> just make guesses and hope for the best.
>>
>>
>>
>> On Wed, Jan 23, 2019 at 12:42 PM Vivien Millet <vivien.millet at
gmail.com>
>> wrote:
>>
>>> (Yes you are right this is my fault)
>>> Considering the string table, it only seems to contains file
relative
>>> informations in every pdb I am using, and it looks correct but I
will check
>>> it.
>>> I looked at the pdb.cpp code about checksums and tables, I copied
some
>>> stuff and got things wrong according to cvdump, then I simplified
the
>>> process of copying the table and it worked (in cvdump it finds the
file
>>> matching line etc...) so I suspect this is also correct.
>>>
>>> All the streams look good, but I will check deeper !
>>>
>>> It seems right what you say about rva and modules, this is what I m
>>> afraid of, doing all of this for nothing or almost..
>>>
>>> Your idea looks good concerning the .text section in a separated
dll,
>>> but will it be executable memory ? .text is where static strings go
right ?
>>> When you say putting my jit in there, do you mean writing it when
the
>>> jitted_code.dll is loaded in memory or on the .dll file directly
before
>>> loading it ? In the first scenario I wonder if the section will be
>>> executable, in the second scenario I can’t do it because it would
require
>>> perfect linking with the other code my jit points to..
>>>
>>> Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at
google.com> a
>>> écrit :
>>>
>>>> (BTW, I'm adding llvm-dev back to the list, I didn't
notice it got
>>>> taken off.  In general I try to keep the list on all emails,
even if it's
>>>> extremely technical and specific, because someday someone else
will try to
>>>> do this, and it'll be nice if they can read the whole
thread).
>>>>
>>>> I can think of a couple of things that might be wrong:
>>>>
>>>> 1) If the string table is in a different order, then anything
that
>>>> refers to the string table need to be changed to refer to the
new offset.
>>>> If the string "foo" is at offset 12 in the old PDB,
but offset 15 in the
>>>> new PDB, then somewhere there is a record which is going to
look at offset
>>>> 12 and expect to find something, and that will mess up.  The
main place
>>>> this is important is in the File Checksums table, there is an
entry that
>>>> says which file it is a checksum for, and that refers to the
string table.
>>>> However, it's possible for certain symbol records to refer
to the string
>>>> table too.  See lld/COFF/PDB.cpp and Ctrl+F for
"PDBStrTab" and you will
>>>> find some information about this.
>>>>
>>>> 2) When you run `llvm-pdbutil dump -streams` on the copied PDB,
do all
>>>> of them show a reasonable description?  Are there any streams
that say
>>>> (???)?  If so, that's a problem.
>>>>
>>>> > does visual studio will consider a symbol file broken if
the address
>>>> goes beyond the official module address range (the compiled
one), because
>>>> my JIT code is allocated after the end of the module with
VirtualAlloc
>>>> That is a good question, and part of why my job is so
difficult,
>>>> because I can't look at their code.  But I think the answer
is "probably".
>>>> The debugger has to have some way to convert an address in your
running
>>>> process into a symbol and offset, because that's how all
debug info is
>>>> represented in the PDB.  So if there is no module, then there
is no RVA
>>>> (because the R in RVA means relative, and what would it be
relative to?).
>>>>
>>>> One idea to test this would be to create a DLL called
jitted_code.dll,
>>>> give it a huuuuuge .text section (probably just a .asm file and
use some
>>>> assembly directives to allocate a very large series of null
bytes), and
>>>> then write your jit code into that area.  This way you would
not need to
>>>> modify the existing PDB you would only need to make a new PDB
called
>>>> jitted_code.pdb with 1 module, and those symbols could have
meaningful
>>>> RVAs.  And you might not even need to detach the debugger if
you do things
>>>> this way, because you could just right click the
jitted_code.dll module in
>>>> the modules window and choose Load Symbols.
>>>>
>>>>
>>>>
>>>> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet
<vivien.millet at gmail.com>
>>>> wrote:
>>>>
>>>>> Yes this is it, I just make a copy from a pdb generated by
link.exe
>>>>> (the microsoft one).
>>>>> Using llvm-pdbutil to compare is what I do, except I do it
with "-all"
>>>>> And I get almost everything the same : same number of
streams, section
>>>>> map looks good,string table looks good (even if not the
same order), same
>>>>> number of modules with the symbols and subsection
practically the same, and
>>>>> this is why I get stuck, I miss something but I can't
see what because I
>>>>> don't know where to look for. Visual studio works with
it, I can debug my
>>>>> original exe, but probably without the globals...
>>>>> And the other problem is that the difference between the
dumps is not
>>>>> necessarily a bug because the builder may generate new
hashes values,
>>>>> reorder streams, modules, etc ...
>>>>>
>>>>> Right now I gave up to have publics and globals streams and
attacked
>>>>> the real goal : insert my jit codeview into the pdb. I have
again done «
>>>>> something » but as I don’t understand how the format work I
don’t have it
>>>>> working in visual studio.. except once, a single time it
worked and the
>>>>> breakpoint turned on in the UI (even if the rva was broken
for the
>>>>> instructions) but it happened a single time .. then I get
depressed the
>>>>> next times..... cvdump displays it all « correct », no
corrupt stuff
>>>>> apparently. But what I do is probably wrong somewhere. What
I do is I take
>>>>> .debug$S and .debug$T as is without relocations just to
see, but what I
>>>>> don’t know really is : does visual studio will consider a
symbol file
>>>>> broken if the address goes beyond the official module
address range (the
>>>>> compiled one), because my JIT code is allocated after the
end of the module
>>>>> with VirtualAlloc.
>>>>> Another thing I don’t get is the section contribution, what
is it
>>>>> exactly ? I inserted section contrib for all sections
except the debug$
>>>>> ones but I don’t know what i’m really doing and it’s my
average problem
>>>>> implementing this JIT feature...
>>>>> I also don’t know what are relocations inside the codeview
format,
>>>>> what is the difference between RVA and relocation, is there
anything to do
>>>>> with this related to the codeview part I need to insert in
the pdb ? I
>>>>> don’t see why visual studio needs more than just
RVA<->Line mapping..
>>>>> This is really making me crazy being so ignorant and trying
to guess
>>>>> what visual studio does...
>>>>>
>>>>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner
at google.com> a
>>>>> écrit :
>>>>>
>>>>>> So if i understand correctly, you're basically just
trying to
>>>>>> implement something like a pdb *copy*, just as a test
to see if you can get
>>>>>> it to work.  So you generate a PDB with cl/link or
clang-cl/lld-link, then
>>>>>> try to copy it using your tool, then see if it still
works.
>>>>>>
>>>>>> If this is correct, and it's not working, then
there is probably just
>>>>>> something you didn't copy.  Neither Publics nor
globals actually contain
>>>>>> their own data, instead they just refer to records from
the corresponding
>>>>>> module stream.  So an S_PROCREF for the function
"main" might have fields
>>>>>> that say "the name of the function is main, and
it's at offset 20 of module
>>>>>> 1".  So, if there is no module 1, or if offset 20
of module is not actually
>>>>>> an S_GPROC32 for the function main, then it will be
broken.
>>>>>>
>>>>>> Did you also go through each module in the source PDB,
add a new
>>>>>> module in the target PDB, then copy all of the symbols
for each one?
>>>>>>
>>>>>> the best way to find differences is by using
llvm-pdbutil on the
>>>>>> source and target PDBs and looking for things that look
different.  For
>>>>>> example, I'd start with llvm-pdbutil dump -streams
and then seeing if they
>>>>>> even have all the same streams.  If one of them is
missing streams, that's
>>>>>> a good place to start.  If they have the same streams,
then look for ones
>>>>>> where the size is different.  Then drill into those to
see why the size is
>>>>>> different.
>>>>>>
>>>>>> LMK if that helps.
>>>>>>
>>>>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <
>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>
>>>>>>> For now I'm not merging my JIT CodeView
section, I only try to build
>>>>>>> a pure copy of an existing PDB using the XxxBuilder
classes (PDBFileBuilder
>>>>>>> & Co / reading a PDBFile) and check if visual
studio wants to eat it..
>>>>>>> For Publics and Globals, what I do is naive, I use
the
>>>>>>> GsiStreamBuilder and prey :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   if (File.hasPDBGlobalsStream() &&
File.getPDBGlobalsStream()) {
>>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>>     GlobalsStream &stream =
*File.getPDBGlobalsStream();
>>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>>
>>>>>>>     for (uint32_t PubSymOff :
stream.getGlobalsTable()) {
>>>>>>>       CVSymbol Sym =
SymbolRecords.readRecord(PubSymOff);
>>>>>>>       builder.addGlobalSymbol(Sym);
>>>>>>>     }
>>>>>>>   }
>>>>>>>   if (File.hasPDBPublicsStream() &&
File.getPDBPublicsStream()) {
>>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>>     PublicsStream &stream =
*File.getPDBPublicsStream();
>>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>>
>>>>>>> 	std::vector<PublicSym32> Publics;
>>>>>>>
>>>>>>>     for (uint32_t PubSymOff :
stream.getPublicsTable()) {
>>>>>>>       PublicSym32 Pub = cantFail(
>>>>>>>          
llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>>>>              
SymbolRecords.readRecord(PubSymOff)));
>>>>>>>       Publics.push_back(Pub);
>>>>>>>     }
>>>>>>>
>>>>>>>     if (!Publics.empty()) {
>>>>>>>       // Sort the public symbols and add them to
the stream.
>>>>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>>>>            [](const PublicSym32 &L, const
PublicSym32 &R) {
>>>>>>>              return L.Name < R.Name;
>>>>>>>            });
>>>>>>>       for (const PublicSym32 &Pub : Publics)
>>>>>>>         builder.addPublicSymbol(Pub);
>>>>>>>     }
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>> Is it what you meant ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner
<zturner at google.com>
>>>>>>> a écrit :
>>>>>>>
>>>>>>>> Also, even if symbolGoesInGlobalsStream returns
true, you can’t
>>>>>>>> just copy it. Functions, for example, which are
S_GPROC32 or S_LPROC32 in
>>>>>>>> the module stream, are S_PROCREF in the globals
stream. Similarly,
>>>>>>>> *everything* in the publics stream is S_PUB32.
So you need to convert each
>>>>>>>> symbol to the proper type for the stream it’s
going to go in
>>>>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner
<zturner at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Publics are basically a list of everything
that has a mangled
>>>>>>>>> name. To be honest, I don’t know what the
debugger uses this for.
>>>>>>>>>
>>>>>>>>> Globals is essentially every symbol in the
pdb in one large table.
>>>>>>>>> The reason this is important is because if
you type “foo” in the watch
>>>>>>>>> window, the debugger doesn’t necessarily
know what compiland foo comes
>>>>>>>>> from. So it has to have a way to find
everything in the entire program no
>>>>>>>>> matter what compiland it came from. That’s
what the globals are.
>>>>>>>>>
>>>>>>>>> Both publics and globals are hash tables,
so one possible reason
>>>>>>>>> there might be a problem is that you need
to rehash the entire table. When
>>>>>>>>> you build your modified pdb, I would
suggest starting with an empty publics
>>>>>>>>> / globals stream, adding all items from the
first pdb by iterating over
>>>>>>>>> those records and using a
GlobalsStreamBuilder, then adding all your jitted
>>>>>>>>> items separately, then writing it out. That
should make sure it gets hashed
>>>>>>>>> correctly.
>>>>>>>>>
>>>>>>>>> Are you doing that?
>>>>>>>>>
>>>>>>>>> Btw, not all symbols belong in the globals
/ publics stream. Check
>>>>>>>>> the code in lld and search for
symbolGoesInGlobalsStream and
>>>>>>>>> symbolGoesInPublicsStream to see the logic
it uses
>>>>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien
Millet <
>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Zachary, sorry for disturbing
again..
>>>>>>>>>>
>>>>>>>>>> I've fixed some problems
(StringTable, SectionMap and few things
>>>>>>>>>> here and there..) and my converted PDB
seems now to work inside visual
>>>>>>>>>> studio..
>>>>>>>>>> But I'm not sure if I have full
debug features because I don't
>>>>>>>>>> succeed to translate Publics and
Globals correctly. CVDump says PDB is
>>>>>>>>>> corrupted whereas PDBUTIL -dump
correctly displays them.
>>>>>>>>>> I don't really understand what
Publics and Globals stream really
>>>>>>>>>> are, if the symbols are really in the
corresponding streams or if they are
>>>>>>>>>> just references to somewhere else.
>>>>>>>>>> The LLVM documentation is not complete
about these two Publics
>>>>>>>>>> and Globals stream so I'm a bit
lost on how to handle them or find what is
>>>>>>>>>> "corrupted" according to
CVDump.
>>>>>>>>>> I took example on LLD and yaml2pdb to
help me to do some tough
>>>>>>>>>> conversions but I noticed that in
yaml2pdb there is no GsiStream exported
>>>>>>>>>> (no GsiBuidler use and no reference to
Publics or Globals anywhere), is it
>>>>>>>>>> wanted/correct ?
>>>>>>>>>> Thanks and sorry If I'm a bit
spaming, it's my 99% time task
>>>>>>>>>> right now and being stuck without any
clue is difficult :) But I guess you
>>>>>>>>>> experienced even more suffering when
documentation didn't exist at all !
>>>>>>>>>> Have a good day !
>>>>>>>>>>
>>>>>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien
Millet <
>>>>>>>>>> vivien.millet at gmail.com> a écrit
:
>>>>>>>>>>
>>>>>>>>>>> ERRATUM, my bad, the pdb I tested
is also corrupted according to
>>>>>>>>>>> cvdump.exe, I on't know why, I
regenerated again and now I have a working
>>>>>>>>>>> dump. You don't need to fix
anything.
>>>>>>>>>>>
>>>>>>>>>>> Le dim. 20 janv. 2019 à 20:26,
Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> a
écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Hi Zachary,
>>>>>>>>>>>> I've done a first step to
rewrite  existing PDBFile with
>>>>>>>>>>>> PDBFileBuilder, I get mostly of
the work done but I don't get as much
>>>>>>>>>>>> output as input (some streams
are not mirrored for unknown reasons and some
>>>>>>>>>>>> data must be missing here and
there...).
>>>>>>>>>>>> When I try to replace the
original by the rebuilt one for
>>>>>>>>>>>> debugging, the pdb loads well
but breakpoints failed to activate with a
>>>>>>>>>>>> "unexpected symbol reader
error while processing foobar.exe". You probably
>>>>>>>>>>>> know what it means or already
encountered this error I guess.
>>>>>>>>>>>> I also tried to create a
minimal program to simplify
>>>>>>>>>>>> comparisons between original
and new PDB but I get an error dumping the
>>>>>>>>>>>> original  pdb exported by
visual studio  with -all (PublicsStream.cpp|98).
>>>>>>>>>>>> I think it is a bug.
>>>>>>>>>>>> I've attached the related
main.cpp and PDB to this email if you
>>>>>>>>>>>> want to check what is the error
exactly (vs2017, x86 and x64 have same
>>>>>>>>>>>> issues).
>>>>>>>>>>>> I've attached also my code
(git diff). I added an « identity »
>>>>>>>>>>>> feature to pdbutil which uses
the code I wrote to regenerate the input pdb.
>>>>>>>>>>>> You can use it to see what I
get so far..
>>>>>>>>>>>> I’ve seen you added recently a
fix related to FPO but you say
>>>>>>>>>>>> it’s only for x86 so I don’t
think it would change something but who knows..
>>>>>>>>>>>> Anyway, if you have a moment to
check my work so far and give
>>>>>>>>>>>> me feedbacks it’s welcome
because I get out of ideas about what goes wrong..
>>>>>>>>>>>> Thanks, I go back digging into
the pdb mysteries !
>>>>>>>>>>>>
>>>>>>>>>>>> Le ven. 18 janv. 2019 à 12:31,
Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com>
a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> Ok ! It was just to be sure
I understood well.
>>>>>>>>>>>>> Sorry for not replying
directly, I wanted to try first to emit
>>>>>>>>>>>>> CodeView before continuing
the discussion and it was time for me to go to
>>>>>>>>>>>>> bed here..
>>>>>>>>>>>>> I just tried it now and it
is very easy to switch to CodeView.
>>>>>>>>>>>>> For the ones interested :
you just have to give your TargetTriple to your
>>>>>>>>>>>>> llvm::Module used for JIT
and then call
>>>>>>>>>>>>>
module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell
the
>>>>>>>>>>>>> AsmPrinter this module
prefer CodeView instead of Dwarf.
>>>>>>>>>>>>> I've checked the
content of my .obj file, and there is valid
>>>>>>>>>>>>> .debug$T and  .debug$S
sections, so everything goes well until now.
>>>>>>>>>>>>> Now as a parallel task I
will try to read the EXE PDB and
>>>>>>>>>>>>> re-export it "as
it" to see if I break something in visual studio.
>>>>>>>>>>>>> If I succeed to do that,
that might be added as a feature to
>>>>>>>>>>>>> PDBFile or PDBFileBuilder
to simplify the process for other users.
>>>>>>>>>>>>> I keep you in touch.
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à
20:50, Zachary Turner <
>>>>>>>>>>>>> zturner at google.com> a
écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I say
"nothing to do" I just mean that you won't have to
>>>>>>>>>>>>>> do anything to convert
the record from one format (DWARF) to another format
>>>>>>>>>>>>>> (CodeView).  You will
have a COFF object file either on disk (probably
>>>>>>>>>>>>>> named foo.obj or
something) or in memory.  And this object file will have a
>>>>>>>>>>>>>> .debug$S section with
CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>>>>>> types.  Then you will
still need to use the PDBFileBuilder to add these
>>>>>>>>>>>>>> records to the final
PDB, but they will already be in the correct format
>>>>>>>>>>>>>> that PDBFileBuilder
expects, you won't need to convert them from DWARF
>>>>>>>>>>>>>> (which is not trivial).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at
11:26 AM Vivien Millet <
>>>>>>>>>>>>>> vivien.millet at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That’s a good
question, by default when emitting the object
>>>>>>>>>>>>>>> file I choose COFF
but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>>>>> probably is a way
to do it or at least it must be implemented if not yet..
>>>>>>>>>>>>>>> Lets imagine I
manage to do that.. when you say there is
>>>>>>>>>>>>>>> nothing to do, I
still must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>>>>> inside the EXE PDB
right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>>>>> another way ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le jeu. 17 janv.
2019 à 20:01, Zachary Turner <
>>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well, is it
possible to just hook up the CodeView debug
>>>>>>>>>>>>>>>> info generator
to MCJIT?  If you're not jitting, and you just compile
>>>>>>>>>>>>>>>> something, we
translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>>>>>>
CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>>>>>> If it's not
hard to do, this would probably be a better solution, because
>>>>>>>>>>>>>>>> you don't
have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>>>>> is not always
trivial.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you can
configure this in MCJIT, you won't even need to
>>>>>>>>>>>>>>>> do anything,
you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>>>>> .debug$S
sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>>>>> while copying
them to the output PDB file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 17,
2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>>>>> vivien.millet
at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ok I
understand more what you meant. In fact I don’t care
>>>>>>>>>>>>>>>>> about the
pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>>>>> me to have
duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>>>>> is not to
generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>>>>> extract
debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>>>>> is emitted
by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>>>>>
PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>>>>> I think
that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>>>>> MCJit if
not being an extension to pdbutil.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Le jeu. 17
janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well,
for example the TPI stream is just one big
>>>>>>>>>>>>>>>>>>
collection of types.  Presumably your JIT code will reuse some of the same
>>>>>>>>>>>>>>>>>> types
(perhaps, std::string for example) as your non-jitted code.  Your
>>>>>>>>>>>>>>>>>> jitted
symbol records in the object file (for example, a local variable of
>>>>>>>>>>>>>>>>>> type
std::string in your jitted code) will refer to the type for
>>>>>>>>>>>>>>>>>>
std;:string by a TypeIndex, and your original PDB will also refer to
>>>>>>>>>>>>>>>>>>
std::string by a different TypeIndex.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In LLD,
when we merge in types and symbols from each
>>>>>>>>>>>>>>>>>> object
file, we keep a hash table of which types have already been seen, so
>>>>>>>>>>>>>>>>>> that if
we see the same type again, we can just use the TypeIndex that we
>>>>>>>>>>>>>>>>>> wrote
on a previous object file.  Then, when we add symbol records, we have
>>>>>>>>>>>>>>>>>> to
update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>>>>>
instead.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
De-duplicating though, I suppose, is not strictly
>>>>>>>>>>>>>>>>>>
necessary, it will just keep your PDB size down.  But you *will* need to at
>>>>>>>>>>>>>>>>>> least
re-write the TypeIndexes from the jitted code.  For example, you may
>>>>>>>>>>>>>>>>>> decide
that instead of de-duplicating, you just append them all to the end
>>>>>>>>>>>>>>>>>> of the
TPI stream (where all the types go in PDB) to keep things simple.
>>>>>>>>>>>>>>>>>> Since
they were in a different position before, they now have different
>>>>>>>>>>>>>>>>>>
TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>>>>> correct
after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>>>>> you
will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>>>>> the
symbols of the jitted code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me
know if that makes sense.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu,
Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ok
I see..
>>>>>>>>>>>>>>>>>>>
what do you mean by “making sure to de-duplicate records
>>>>>>>>>>>>>>>>>>> as
necessary” ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Le
jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
It's possible in theory to support incremental updates
>>>>>>>>>>>>>>>>>>>>
to a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>>>>>
this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>>>>>
support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>>>>>
build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>>>>>
de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>>>>>
most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>>>>>
writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>>>>>
advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Hi Zachary !
>>>>>>>>>>>>>>>>>>>>>
If there a way to easily create a new PDBFileBuilder
>>>>>>>>>>>>>>>>>>>>>
from an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>>>>>
I would like to start from a builder filled with the
>>>>>>>>>>>>>>>>>>>>>
EXE PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thanks !
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thank you Zachary !
>>>>>>>>>>>>>>>>>>>>>>
I will have some soon I think ..
>>>>>>>>>>>>>>>>>>>>>>
I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>>>>>
because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>>>>>>
questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>>>>>>
enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>>>>>>
still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>>>>>>
start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>>>>>>
working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>>>>>>
and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>>>>>
will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>>>>>
writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>>>>>
When you talk about doing all of this I suppose
>>>>>>>>>>>>>>>>>>>>>>>>>>
you think about using llvm/debuginfo/pdb, pick code here and there to
>>>>>>>>>>>>>>>>>>>>>>>>>>
generate the pdb in memory, read the executable one and perform the merge
>>>>>>>>>>>>>>>>>>>>>>>>>>
directly in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
So you are one of the happy guys who suffered
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
from the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Merge my JIT pdb into a copy of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- On each JIT rebuild, restart these steps from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>>>>>
think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>>>>>
that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
So, here are the things I think you would need
>>>>>>>>>>>>>>>>>>>>>>>>>>>
to do:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>>>>>>
unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>>>>>
you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>>>>>
there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>>>>>
have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>>>>>
right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>
anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>>>>>
merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>>>>>
the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>>>>>
indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>>>>>
(lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>>>>>
section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>>>>>
file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>>>>>
shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>
PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
You're kind of in uncharted territory here, so
>>>>>>>>>>>>>>>>>>>>>>>>>>>
this is just a rough idea of what needs to be done.  There may be other
>>>>>>>>>>>>>>>>>>>>>>>>>>>
issues that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unfortunately I don't personally have the time
>>>>>>>>>>>>>>>>>>>>>>>>>>>
to work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>>>>>
questions or problems along the way.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190128/5eefe461/attachment-0001.html>

Zachary Turner via llvm-dev

2019-Jan-28 19:57 UTC

head link

[llvm-dev] [llvm-pdbutil] : merge not working properly

two ideas for checking validity of the records:

1) Use llvm-pdbutil with the "pretty" command line option.  This will
use
DIA SDK, which as far as I know is the same as what VS Debugger uses.  It
won't really tell you specifically what is wrong, but it can give you some
hints because if it crashes somewhere, then at least you know *where* it's
crashing.  For example, if it's crashing on trying to get an address of a
symbol, then you might check the symbol record's section/offset, look at
the section map and section contributions, compare those to your
executable, and make sure everything matches up.

2) Use cvdump.  This can also be a hint about which specific records it
doesn't like.

If you're not using C++ mangling, then that could definitely be a problem.
We have MicrosoftMangle.cpp which is in clang, so it might take some work
to hook it up in your case, as it's meant to be used from the C++ compiler
frontend, and not from the JIT.  But if you can get that to work, that
would be a good thing to try.

When it comes to things like this, my strategy is always that if I have to
ask myself "I'm not doing X and I know normally the compiler does X, I
wonder if I should do that?" then the answer is always "yes". 
It's only
after I have no other ideas and everything looks identical that I really
get stuck.

To answer your question about section contributions, it's necessary because
the debugger needs to be able to determine which "module" (i.e. source
/
object file) a symbol comes from.  Imagine, for example, that the debugger
is trying to find what symbol is at a particular address, say 0x12345678.
Then it will subtract the load address of the module (let's say 0x10000000)
and get the address 0x02345678.  Then in order to find the symbol for this,
it has to first find the module.  So it looks in the section headers
(llvm-pdbutil dump -section-headers), and finds which section would contain
the address 0x02345678.  Pretend that it finds that it is section #1, the
.text section, and the section header says the base address is 0x02300000.
So now it knows that it is offset 0x45678 from section #1.

It can then binary search the section contributions since it is sorted by
module and offset, to find an entry that contains its size and section
properties.

In the reverse direction, many symbol records often contain a
section/offset.  So in this can if it has one of these records, it can
binary search in the section contribs to find the properties.

On Mon, Jan 28, 2019 at 11:21 AM Vivien Millet <vivien.millet at
gmail.com>
wrote:
> Hello Zachary,
> Sorry for replying so lately but It's been a week I'm thinking an
working
> hard on your  "dll memory buffer"  idea to see if it works and
give you
> feedbacks !
> And it works pretty well until now :
> I shared on the list what I did :
> - create a .ASM file full of "int 3" instructions (to ensure that
if we
> execute over the boundaries we instantly break.
> - Compile this to a .DLL
> - use hexadecimal editor to change ".text" section
Characteristics from
> Read/Execute to Read/Write/Execute
> - run my program which does JIT compilation
> - get the start RVA of the .text section (which is always 0x1000 in my
> case)
> - Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a
> custom DllMemMgr I give to MCJIT
> - On NotifyObjectEmitted replace the dll pdb by a custom one I build
> myself with your PDBFileBuilder
> - On finalizing memory, reload first the dll to trigger visual studio pdb
> reloading (not working don't know why yet), ensure it goes into the
same
> virtual space, protect memory using VirtualProtect.
> - Place a breakpoint in my JIT file, it displays "loaded",
execute JIT, it
> breaks
> ...and ....
> * drums *
> Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc ...
> and this, every time, I don't know why..
> I noticed, when compiling C++ equivalent to my JIT program, that a simple
> "int param" is written size=20 in C++ pdb and size=16 in my JIT
pdb, do you
> know what this "size" attribute represent in the S_LOCAL Symbol
section ? I
> suspect the symbol section to have program for the watch issue .. but I am
> not sure, If you have an idea...
> I also had an "illegal instruction" exception when stepping with
F10 after
> break, but when I'm not breaking the code it runs well..
>
> A lot of mysteries there again...
>
> Visual studio displays well the disassembly with the debug lines at the
> right place, etc .. so I don't get why visual studio crashes..
> Another issue I have is that I always have to remove/add my breakpoint so
> that visual studio realy breaks, even if it says "I'm a good
breakpoint at
> that good address". Does it have a relation with file checksums ? It
seems
> mine has a "none" checksum so I suspect this to be the problem..
but I
> don't know how to fix it because I added the checksum with addChecksum
with
> the good file name and still I get "none" in the dump...
> So right know I'm quite hopeful because I get something reacting in
Visual
> studio, but I have no idea why it crashes..
>
> Have you already encountered this issue when testing your generated pdbs
> ?
> Do you know the role of Section Contributions in the PDB/debugging session
> ?
> Any tip for checking Symbol record validity in the dump ? looks good to
> me, no ??? anywhere or Error ..
>
> Thank you !
>
>
>
>
>
> Le mer. 23 janv. 2019 à 22:29, Zachary Turner <zturner at google.com>
a
> écrit :
>
>> .text is where code goes, I don't know why it's called .text,
it's just
>> been that way for many decades and the name stuck around.  But actually
you
>> can call the section whatever you want.  Maybe it's even better to
call it
>> something other than .text, because .text is where your DllMain and
other
>> stuff will be.  You could call it .jit if you wanted to.  You should be
>> able to create the section with whatever flags you want to.  You'll
need to
>> produce a jit_code.obj probably compiled from assembly that makes a
section
>> named .jit and sets the flags to be executable (you can just copy the
flags
>> from a normal .text section of some other program).  Then link this
file
>> together along with a jitted_code_main.obj which you compiled from a
simple
>> source file with a DllMain function that does nothing.  This would make
>> jitted_code.dll, then have your program link against jitted_code.lib.
>>
>> Right now you jit the code into some buffer that you created with
>> VirtualAlloc.  If you do the above, it will load jitted_code.dll into
>> memory and the OS loader will allocate some memory for each section. 
So
>> this would be like your VirtualAlloc, you can just find the address of
the
>> .jit section and use that buffer instead of the VirtualAlloc buffer as
the
>> target address of your jit operations.
>>
>> Again, this is just an idea, no promises it will work, but
unfortunately
>> that's kind of the best you can do when dealing with closed source
things,
>> just make guesses and hope for the best.
>>
>>
>>
>> On Wed, Jan 23, 2019 at 12:42 PM Vivien Millet <vivien.millet at
gmail.com>
>> wrote:
>>
>>> (Yes you are right this is my fault)
>>> Considering the string table, it only seems to contains file
relative
>>> informations in every pdb I am using, and it looks correct but I
will check
>>> it.
>>> I looked at the pdb.cpp code about checksums and tables, I copied
some
>>> stuff and got things wrong according to cvdump, then I simplified
the
>>> process of copying the table and it worked (in cvdump it finds the
file
>>> matching line etc...) so I suspect this is also correct.
>>>
>>> All the streams look good, but I will check deeper !
>>>
>>> It seems right what you say about rva and modules, this is what I m
>>> afraid of, doing all of this for nothing or almost..
>>>
>>> Your idea looks good concerning the .text section in a separated
dll,
>>> but will it be executable memory ? .text is where static strings go
right ?
>>> When you say putting my jit in there, do you mean writing it when
the
>>> jitted_code.dll is loaded in memory or on the .dll file directly
before
>>> loading it ? In the first scenario I wonder if the section will be
>>> executable, in the second scenario I can’t do it because it would
require
>>> perfect linking with the other code my jit points to..
>>>
>>> Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at
google.com> a
>>> écrit :
>>>
>>>> (BTW, I'm adding llvm-dev back to the list, I didn't
notice it got
>>>> taken off.  In general I try to keep the list on all emails,
even if it's
>>>> extremely technical and specific, because someday someone else
will try to
>>>> do this, and it'll be nice if they can read the whole
thread).
>>>>
>>>> I can think of a couple of things that might be wrong:
>>>>
>>>> 1) If the string table is in a different order, then anything
that
>>>> refers to the string table need to be changed to refer to the
new offset.
>>>> If the string "foo" is at offset 12 in the old PDB,
but offset 15 in the
>>>> new PDB, then somewhere there is a record which is going to
look at offset
>>>> 12 and expect to find something, and that will mess up.  The
main place
>>>> this is important is in the File Checksums table, there is an
entry that
>>>> says which file it is a checksum for, and that refers to the
string table.
>>>> However, it's possible for certain symbol records to refer
to the string
>>>> table too.  See lld/COFF/PDB.cpp and Ctrl+F for
"PDBStrTab" and you will
>>>> find some information about this.
>>>>
>>>> 2) When you run `llvm-pdbutil dump -streams` on the copied PDB,
do all
>>>> of them show a reasonable description?  Are there any streams
that say
>>>> (???)?  If so, that's a problem.
>>>>
>>>> > does visual studio will consider a symbol file broken if
the address
>>>> goes beyond the official module address range (the compiled
one), because
>>>> my JIT code is allocated after the end of the module with
VirtualAlloc
>>>> That is a good question, and part of why my job is so
difficult,
>>>> because I can't look at their code.  But I think the answer
is "probably".
>>>> The debugger has to have some way to convert an address in your
running
>>>> process into a symbol and offset, because that's how all
debug info is
>>>> represented in the PDB.  So if there is no module, then there
is no RVA
>>>> (because the R in RVA means relative, and what would it be
relative to?).
>>>>
>>>> One idea to test this would be to create a DLL called
jitted_code.dll,
>>>> give it a huuuuuge .text section (probably just a .asm file and
use some
>>>> assembly directives to allocate a very large series of null
bytes), and
>>>> then write your jit code into that area.  This way you would
not need to
>>>> modify the existing PDB you would only need to make a new PDB
called
>>>> jitted_code.pdb with 1 module, and those symbols could have
meaningful
>>>> RVAs.  And you might not even need to detach the debugger if
you do things
>>>> this way, because you could just right click the
jitted_code.dll module in
>>>> the modules window and choose Load Symbols.
>>>>
>>>>
>>>>
>>>> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet
<vivien.millet at gmail.com>
>>>> wrote:
>>>>
>>>>> Yes this is it, I just make a copy from a pdb generated by
link.exe
>>>>> (the microsoft one).
>>>>> Using llvm-pdbutil to compare is what I do, except I do it
with "-all"
>>>>> And I get almost everything the same : same number of
streams, section
>>>>> map looks good,string table looks good (even if not the
same order), same
>>>>> number of modules with the symbols and subsection
practically the same, and
>>>>> this is why I get stuck, I miss something but I can't
see what because I
>>>>> don't know where to look for. Visual studio works with
it, I can debug my
>>>>> original exe, but probably without the globals...
>>>>> And the other problem is that the difference between the
dumps is not
>>>>> necessarily a bug because the builder may generate new
hashes values,
>>>>> reorder streams, modules, etc ...
>>>>>
>>>>> Right now I gave up to have publics and globals streams and
attacked
>>>>> the real goal : insert my jit codeview into the pdb. I have
again done «
>>>>> something » but as I don’t understand how the format work I
don’t have it
>>>>> working in visual studio.. except once, a single time it
worked and the
>>>>> breakpoint turned on in the UI (even if the rva was broken
for the
>>>>> instructions) but it happened a single time .. then I get
depressed the
>>>>> next times..... cvdump displays it all « correct », no
corrupt stuff
>>>>> apparently. But what I do is probably wrong somewhere. What
I do is I take
>>>>> .debug$S and .debug$T as is without relocations just to
see, but what I
>>>>> don’t know really is : does visual studio will consider a
symbol file
>>>>> broken if the address goes beyond the official module
address range (the
>>>>> compiled one), because my JIT code is allocated after the
end of the module
>>>>> with VirtualAlloc.
>>>>> Another thing I don’t get is the section contribution, what
is it
>>>>> exactly ? I inserted section contrib for all sections
except the debug$
>>>>> ones but I don’t know what i’m really doing and it’s my
average problem
>>>>> implementing this JIT feature...
>>>>> I also don’t know what are relocations inside the codeview
format,
>>>>> what is the difference between RVA and relocation, is there
anything to do
>>>>> with this related to the codeview part I need to insert in
the pdb ? I
>>>>> don’t see why visual studio needs more than just
RVA<->Line mapping..
>>>>> This is really making me crazy being so ignorant and trying
to guess
>>>>> what visual studio does...
>>>>>
>>>>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner
at google.com> a
>>>>> écrit :
>>>>>
>>>>>> So if i understand correctly, you're basically just
trying to
>>>>>> implement something like a pdb *copy*, just as a test
to see if you can get
>>>>>> it to work.  So you generate a PDB with cl/link or
clang-cl/lld-link, then
>>>>>> try to copy it using your tool, then see if it still
works.
>>>>>>
>>>>>> If this is correct, and it's not working, then
there is probably just
>>>>>> something you didn't copy.  Neither Publics nor
globals actually contain
>>>>>> their own data, instead they just refer to records from
the corresponding
>>>>>> module stream.  So an S_PROCREF for the function
"main" might have fields
>>>>>> that say "the name of the function is main, and
it's at offset 20 of module
>>>>>> 1".  So, if there is no module 1, or if offset 20
of module is not actually
>>>>>> an S_GPROC32 for the function main, then it will be
broken.
>>>>>>
>>>>>> Did you also go through each module in the source PDB,
add a new
>>>>>> module in the target PDB, then copy all of the symbols
for each one?
>>>>>>
>>>>>> the best way to find differences is by using
llvm-pdbutil on the
>>>>>> source and target PDBs and looking for things that look
different.  For
>>>>>> example, I'd start with llvm-pdbutil dump -streams
and then seeing if they
>>>>>> even have all the same streams.  If one of them is
missing streams, that's
>>>>>> a good place to start.  If they have the same streams,
then look for ones
>>>>>> where the size is different.  Then drill into those to
see why the size is
>>>>>> different.
>>>>>>
>>>>>> LMK if that helps.
>>>>>>
>>>>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <
>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>
>>>>>>> For now I'm not merging my JIT CodeView
section, I only try to build
>>>>>>> a pure copy of an existing PDB using the XxxBuilder
classes (PDBFileBuilder
>>>>>>> & Co / reading a PDBFile) and check if visual
studio wants to eat it..
>>>>>>> For Publics and Globals, what I do is naive, I use
the
>>>>>>> GsiStreamBuilder and prey :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   if (File.hasPDBGlobalsStream() &&
File.getPDBGlobalsStream()) {
>>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>>     GlobalsStream &stream =
*File.getPDBGlobalsStream();
>>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>>
>>>>>>>     for (uint32_t PubSymOff :
stream.getGlobalsTable()) {
>>>>>>>       CVSymbol Sym =
SymbolRecords.readRecord(PubSymOff);
>>>>>>>       builder.addGlobalSymbol(Sym);
>>>>>>>     }
>>>>>>>   }
>>>>>>>   if (File.hasPDBPublicsStream() &&
File.getPDBPublicsStream()) {
>>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>>     PublicsStream &stream =
*File.getPDBPublicsStream();
>>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>>
>>>>>>> 	std::vector<PublicSym32> Publics;
>>>>>>>
>>>>>>>     for (uint32_t PubSymOff :
stream.getPublicsTable()) {
>>>>>>>       PublicSym32 Pub = cantFail(
>>>>>>>          
llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>>>>              
SymbolRecords.readRecord(PubSymOff)));
>>>>>>>       Publics.push_back(Pub);
>>>>>>>     }
>>>>>>>
>>>>>>>     if (!Publics.empty()) {
>>>>>>>       // Sort the public symbols and add them to
the stream.
>>>>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>>>>            [](const PublicSym32 &L, const
PublicSym32 &R) {
>>>>>>>              return L.Name < R.Name;
>>>>>>>            });
>>>>>>>       for (const PublicSym32 &Pub : Publics)
>>>>>>>         builder.addPublicSymbol(Pub);
>>>>>>>     }
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>> Is it what you meant ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner
<zturner at google.com>
>>>>>>> a écrit :
>>>>>>>
>>>>>>>> Also, even if symbolGoesInGlobalsStream returns
true, you can’t
>>>>>>>> just copy it. Functions, for example, which are
S_GPROC32 or S_LPROC32 in
>>>>>>>> the module stream, are S_PROCREF in the globals
stream. Similarly,
>>>>>>>> *everything* in the publics stream is S_PUB32.
So you need to convert each
>>>>>>>> symbol to the proper type for the stream it’s
going to go in
>>>>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner
<zturner at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Publics are basically a list of everything
that has a mangled
>>>>>>>>> name. To be honest, I don’t know what the
debugger uses this for.
>>>>>>>>>
>>>>>>>>> Globals is essentially every symbol in the
pdb in one large table.
>>>>>>>>> The reason this is important is because if
you type “foo” in the watch
>>>>>>>>> window, the debugger doesn’t necessarily
know what compiland foo comes
>>>>>>>>> from. So it has to have a way to find
everything in the entire program no
>>>>>>>>> matter what compiland it came from. That’s
what the globals are.
>>>>>>>>>
>>>>>>>>> Both publics and globals are hash tables,
so one possible reason
>>>>>>>>> there might be a problem is that you need
to rehash the entire table. When
>>>>>>>>> you build your modified pdb, I would
suggest starting with an empty publics
>>>>>>>>> / globals stream, adding all items from the
first pdb by iterating over
>>>>>>>>> those records and using a
GlobalsStreamBuilder, then adding all your jitted
>>>>>>>>> items separately, then writing it out. That
should make sure it gets hashed
>>>>>>>>> correctly.
>>>>>>>>>
>>>>>>>>> Are you doing that?
>>>>>>>>>
>>>>>>>>> Btw, not all symbols belong in the globals
/ publics stream. Check
>>>>>>>>> the code in lld and search for
symbolGoesInGlobalsStream and
>>>>>>>>> symbolGoesInPublicsStream to see the logic
it uses
>>>>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien
Millet <
>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Zachary, sorry for disturbing
again..
>>>>>>>>>>
>>>>>>>>>> I've fixed some problems
(StringTable, SectionMap and few things
>>>>>>>>>> here and there..) and my converted PDB
seems now to work inside visual
>>>>>>>>>> studio..
>>>>>>>>>> But I'm not sure if I have full
debug features because I don't
>>>>>>>>>> succeed to translate Publics and
Globals correctly. CVDump says PDB is
>>>>>>>>>> corrupted whereas PDBUTIL -dump
correctly displays them.
>>>>>>>>>> I don't really understand what
Publics and Globals stream really
>>>>>>>>>> are, if the symbols are really in the
corresponding streams or if they are
>>>>>>>>>> just references to somewhere else.
>>>>>>>>>> The LLVM documentation is not complete
about these two Publics
>>>>>>>>>> and Globals stream so I'm a bit
lost on how to handle them or find what is
>>>>>>>>>> "corrupted" according to
CVDump.
>>>>>>>>>> I took example on LLD and yaml2pdb to
help me to do some tough
>>>>>>>>>> conversions but I noticed that in
yaml2pdb there is no GsiStream exported
>>>>>>>>>> (no GsiBuidler use and no reference to
Publics or Globals anywhere), is it
>>>>>>>>>> wanted/correct ?
>>>>>>>>>> Thanks and sorry If I'm a bit
spaming, it's my 99% time task
>>>>>>>>>> right now and being stuck without any
clue is difficult :) But I guess you
>>>>>>>>>> experienced even more suffering when
documentation didn't exist at all !
>>>>>>>>>> Have a good day !
>>>>>>>>>>
>>>>>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien
Millet <
>>>>>>>>>> vivien.millet at gmail.com> a écrit
:
>>>>>>>>>>
>>>>>>>>>>> ERRATUM, my bad, the pdb I tested
is also corrupted according to
>>>>>>>>>>> cvdump.exe, I on't know why, I
regenerated again and now I have a working
>>>>>>>>>>> dump. You don't need to fix
anything.
>>>>>>>>>>>
>>>>>>>>>>> Le dim. 20 janv. 2019 à 20:26,
Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> a
écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Hi Zachary,
>>>>>>>>>>>> I've done a first step to
rewrite  existing PDBFile with
>>>>>>>>>>>> PDBFileBuilder, I get mostly of
the work done but I don't get as much
>>>>>>>>>>>> output as input (some streams
are not mirrored for unknown reasons and some
>>>>>>>>>>>> data must be missing here and
there...).
>>>>>>>>>>>> When I try to replace the
original by the rebuilt one for
>>>>>>>>>>>> debugging, the pdb loads well
but breakpoints failed to activate with a
>>>>>>>>>>>> "unexpected symbol reader
error while processing foobar.exe". You probably
>>>>>>>>>>>> know what it means or already
encountered this error I guess.
>>>>>>>>>>>> I also tried to create a
minimal program to simplify
>>>>>>>>>>>> comparisons between original
and new PDB but I get an error dumping the
>>>>>>>>>>>> original  pdb exported by
visual studio  with -all (PublicsStream.cpp|98).
>>>>>>>>>>>> I think it is a bug.
>>>>>>>>>>>> I've attached the related
main.cpp and PDB to this email if you
>>>>>>>>>>>> want to check what is the error
exactly (vs2017, x86 and x64 have same
>>>>>>>>>>>> issues).
>>>>>>>>>>>> I've attached also my code
(git diff). I added an « identity »
>>>>>>>>>>>> feature to pdbutil which uses
the code I wrote to regenerate the input pdb.
>>>>>>>>>>>> You can use it to see what I
get so far..
>>>>>>>>>>>> I’ve seen you added recently a
fix related to FPO but you say
>>>>>>>>>>>> it’s only for x86 so I don’t
think it would change something but who knows..
>>>>>>>>>>>> Anyway, if you have a moment to
check my work so far and give
>>>>>>>>>>>> me feedbacks it’s welcome
because I get out of ideas about what goes wrong..
>>>>>>>>>>>> Thanks, I go back digging into
the pdb mysteries !
>>>>>>>>>>>>
>>>>>>>>>>>> Le ven. 18 janv. 2019 à 12:31,
Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com>
a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> Ok ! It was just to be sure
I understood well.
>>>>>>>>>>>>> Sorry for not replying
directly, I wanted to try first to emit
>>>>>>>>>>>>> CodeView before continuing
the discussion and it was time for me to go to
>>>>>>>>>>>>> bed here..
>>>>>>>>>>>>> I just tried it now and it
is very easy to switch to CodeView.
>>>>>>>>>>>>> For the ones interested :
you just have to give your TargetTriple to your
>>>>>>>>>>>>> llvm::Module used for JIT
and then call
>>>>>>>>>>>>>
module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell
the
>>>>>>>>>>>>> AsmPrinter this module
prefer CodeView instead of Dwarf.
>>>>>>>>>>>>> I've checked the
content of my .obj file, and there is valid
>>>>>>>>>>>>> .debug$T and  .debug$S
sections, so everything goes well until now.
>>>>>>>>>>>>> Now as a parallel task I
will try to read the EXE PDB and
>>>>>>>>>>>>> re-export it "as
it" to see if I break something in visual studio.
>>>>>>>>>>>>> If I succeed to do that,
that might be added as a feature to
>>>>>>>>>>>>> PDBFile or PDBFileBuilder
to simplify the process for other users.
>>>>>>>>>>>>> I keep you in touch.
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à
20:50, Zachary Turner <
>>>>>>>>>>>>> zturner at google.com> a
écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I say
"nothing to do" I just mean that you won't have to
>>>>>>>>>>>>>> do anything to convert
the record from one format (DWARF) to another format
>>>>>>>>>>>>>> (CodeView).  You will
have a COFF object file either on disk (probably
>>>>>>>>>>>>>> named foo.obj or
something) or in memory.  And this object file will have a
>>>>>>>>>>>>>> .debug$S section with
CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>>>>>> types.  Then you will
still need to use the PDBFileBuilder to add these
>>>>>>>>>>>>>> records to the final
PDB, but they will already be in the correct format
>>>>>>>>>>>>>> that PDBFileBuilder
expects, you won't need to convert them from DWARF
>>>>>>>>>>>>>> (which is not trivial).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at
11:26 AM Vivien Millet <
>>>>>>>>>>>>>> vivien.millet at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That’s a good
question, by default when emitting the object
>>>>>>>>>>>>>>> file I choose COFF
but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>>>>> probably is a way
to do it or at least it must be implemented if not yet..
>>>>>>>>>>>>>>> Lets imagine I
manage to do that.. when you say there is
>>>>>>>>>>>>>>> nothing to do, I
still must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>>>>> inside the EXE PDB
right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>>>>> another way ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le jeu. 17 janv.
2019 à 20:01, Zachary Turner <
>>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well, is it
possible to just hook up the CodeView debug
>>>>>>>>>>>>>>>> info generator
to MCJIT?  If you're not jitting, and you just compile
>>>>>>>>>>>>>>>> something, we
translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>>>>>>
CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>>>>>> If it's not
hard to do, this would probably be a better solution, because
>>>>>>>>>>>>>>>> you don't
have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>>>>> is not always
trivial.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you can
configure this in MCJIT, you won't even need to
>>>>>>>>>>>>>>>> do anything,
you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>>>>> .debug$S
sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>>>>> while copying
them to the output PDB file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 17,
2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>>>>> vivien.millet
at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ok I
understand more what you meant. In fact I don’t care
>>>>>>>>>>>>>>>>> about the
pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>>>>> me to have
duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>>>>> is not to
generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>>>>> extract
debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>>>>> is emitted
by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>>>>>
PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>>>>> I think
that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>>>>> MCJit if
not being an extension to pdbutil.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Le jeu. 17
janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well,
for example the TPI stream is just one big
>>>>>>>>>>>>>>>>>>
collection of types.  Presumably your JIT code will reuse some of the same
>>>>>>>>>>>>>>>>>> types
(perhaps, std::string for example) as your non-jitted code.  Your
>>>>>>>>>>>>>>>>>> jitted
symbol records in the object file (for example, a local variable of
>>>>>>>>>>>>>>>>>> type
std::string in your jitted code) will refer to the type for
>>>>>>>>>>>>>>>>>>
std;:string by a TypeIndex, and your original PDB will also refer to
>>>>>>>>>>>>>>>>>>
std::string by a different TypeIndex.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In LLD,
when we merge in types and symbols from each
>>>>>>>>>>>>>>>>>> object
file, we keep a hash table of which types have already been seen, so
>>>>>>>>>>>>>>>>>> that if
we see the same type again, we can just use the TypeIndex that we
>>>>>>>>>>>>>>>>>> wrote
on a previous object file.  Then, when we add symbol records, we have
>>>>>>>>>>>>>>>>>> to
update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>>>>>
instead.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
De-duplicating though, I suppose, is not strictly
>>>>>>>>>>>>>>>>>>
necessary, it will just keep your PDB size down.  But you *will* need to at
>>>>>>>>>>>>>>>>>> least
re-write the TypeIndexes from the jitted code.  For example, you may
>>>>>>>>>>>>>>>>>> decide
that instead of de-duplicating, you just append them all to the end
>>>>>>>>>>>>>>>>>> of the
TPI stream (where all the types go in PDB) to keep things simple.
>>>>>>>>>>>>>>>>>> Since
they were in a different position before, they now have different
>>>>>>>>>>>>>>>>>>
TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>>>>> correct
after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>>>>> you
will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>>>>> the
symbols of the jitted code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me
know if that makes sense.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu,
Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ok
I see..
>>>>>>>>>>>>>>>>>>>
what do you mean by “making sure to de-duplicate records
>>>>>>>>>>>>>>>>>>> as
necessary” ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Le
jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
It's possible in theory to support incremental updates
>>>>>>>>>>>>>>>>>>>>
to a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>>>>>
this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>>>>>
support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>>>>>
build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>>>>>
de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>>>>>
most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>>>>>
writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>>>>>
advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Hi Zachary !
>>>>>>>>>>>>>>>>>>>>>
If there a way to easily create a new PDBFileBuilder
>>>>>>>>>>>>>>>>>>>>>
from an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>>>>>
I would like to start from a builder filled with the
>>>>>>>>>>>>>>>>>>>>>
EXE PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thanks !
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thank you Zachary !
>>>>>>>>>>>>>>>>>>>>>>
I will have some soon I think ..
>>>>>>>>>>>>>>>>>>>>>>
I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>>>>>
because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>>>>>>
questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>>>>>>
enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>>>>>>
still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>>>>>>
start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>>>>>>
working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>>>>>>
and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>>>>>
will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>>>>>
writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>>>>>
When you talk about doing all of this I suppose
>>>>>>>>>>>>>>>>>>>>>>>>>>
you think about using llvm/debuginfo/pdb, pick code here and there to
>>>>>>>>>>>>>>>>>>>>>>>>>>
generate the pdb in memory, read the executable one and perform the merge
>>>>>>>>>>>>>>>>>>>>>>>>>>
directly in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
So you are one of the happy guys who suffered
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
from the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Merge my JIT pdb into a copy of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- On each JIT rebuild, restart these steps from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>>>>>
think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>>>>>
that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
So, here are the things I think you would need
>>>>>>>>>>>>>>>>>>>>>>>>>>>
to do:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>>>>>>
unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>>>>>
you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>>>>>
there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>>>>>
have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>>>>>
right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>
anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>>>>>
merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>>>>>
the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>>>>>
indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>>>>>
(lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>>>>>
section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>>>>>
file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>>>>>
shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>
PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
You're kind of in uncharted territory here, so
>>>>>>>>>>>>>>>>>>>>>>>>>>>
this is just a rough idea of what needs to be done.  There may be other
>>>>>>>>>>>>>>>>>>>>>>>>>>>
issues that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unfortunately I don't personally have the time
>>>>>>>>>>>>>>>>>>>>>>>>>>>
to work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>>>>>
questions or problems along the way.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190128/df0a7d60/attachment-0001.html>

Vivien Millet via llvm-dev

2019-Jan-29 21:49 UTC

head link

[llvm-dev] [llvm-pdbutil] : merge not working properly

Thanks again for your detailed tips !
I tried with "pretty" and indeed it crashes while writing a symbol
name
inside the parameter list, so I'm already
looking into mangling code in clang, which is a bit discouraging ... I will
do it step by step as I don't understand
everything.. I have the chance to have an AST and some visitor on my side
so I can build the mangling without
having to parse more stuff.
Considering what you say about Section Contribution, it looks like that
mines are correct since visual studio
breaks where it should .. right ?


Le lun. 28 janv. 2019 à 20:58, Zachary Turner <zturner at google.com> a
écrit :
> two ideas for checking validity of the records:
>
> 1) Use llvm-pdbutil with the "pretty" command line option.  This
will use
> DIA SDK, which as far as I know is the same as what VS Debugger uses.  It
> won't really tell you specifically what is wrong, but it can give you
some
> hints because if it crashes somewhere, then at least you know *where*
it's
> crashing.  For example, if it's crashing on trying to get an address of
a
> symbol, then you might check the symbol record's section/offset, look
at
> the section map and section contributions, compare those to your
> executable, and make sure everything matches up.
>
> 2) Use cvdump.  This can also be a hint about which specific records it
> doesn't like.
>
> If you're not using C++ mangling, then that could definitely be a
> problem.  We have MicrosoftMangle.cpp which is in clang, so it might take
> some work to hook it up in your case, as it's meant to be used from the
C++
> compiler frontend, and not from the JIT.  But if you can get that to work,
> that would be a good thing to try.
>
> When it comes to things like this, my strategy is always that if I have to
> ask myself "I'm not doing X and I know normally the compiler does
X, I
> wonder if I should do that?" then the answer is always
"yes".  It's only
> after I have no other ideas and everything looks identical that I really
> get stuck.
>
> To answer your question about section contributions, it's necessary
> because the debugger needs to be able to determine which "module"
(i.e.
> source / object file) a symbol comes from.  Imagine, for example, that the
> debugger is trying to find what symbol is at a particular address, say
> 0x12345678.  Then it will subtract the load address of the module
(let's
> say 0x10000000) and get the address 0x02345678.  Then in order to find the
> symbol for this, it has to first find the module.  So it looks in the
> section headers (llvm-pdbutil dump -section-headers), and finds which
> section would contain the address 0x02345678.  Pretend that it finds that
> it is section #1, the .text section, and the section header says the base
> address is 0x02300000.  So now it knows that it is offset 0x45678 from
> section #1.
>
> It can then binary search the section contributions since it is sorted by
> module and offset, to find an entry that contains its size and section
> properties.
>
> In the reverse direction, many symbol records often contain a
> section/offset.  So in this can if it has one of these records, it can
> binary search in the section contribs to find the properties.
>
> On Mon, Jan 28, 2019 at 11:21 AM Vivien Millet <vivien.millet at
gmail.com>
> wrote:
>
>> Hello Zachary,
>> Sorry for replying so lately but It's been a week I'm thinking
an working
>> hard on your  "dll memory buffer"  idea to see if it works
and give you
>> feedbacks !
>> And it works pretty well until now :
>> I shared on the list what I did :
>> - create a .ASM file full of "int 3" instructions (to ensure
that if we
>> execute over the boundaries we instantly break.
>> - Compile this to a .DLL
>> - use hexadecimal editor to change ".text" section
Characteristics from
>> Read/Execute to Read/Write/Execute
>> - run my program which does JIT compilation
>> - get the start RVA of the .text section (which is always 0x1000 in my
>> case)
>> - Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a
>> custom DllMemMgr I give to MCJIT
>> - On NotifyObjectEmitted replace the dll pdb by a custom one I build
>> myself with your PDBFileBuilder
>> - On finalizing memory, reload first the dll to trigger visual studio
pdb
>> reloading (not working don't know why yet), ensure it goes into the
same
>> virtual space, protect memory using VirtualProtect.
>> - Place a breakpoint in my JIT file, it displays "loaded",
execute JIT,
>> it breaks
>> ...and ....
>> * drums *
>> Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc
>> ...  and this, every time, I don't know why..
>> I noticed, when compiling C++ equivalent to my JIT program, that a
simple
>> "int param" is written size=20 in C++ pdb and size=16 in my
JIT pdb, do you
>> know what this "size" attribute represent in the S_LOCAL
Symbol section ? I
>> suspect the symbol section to have program for the watch issue .. but I
am
>> not sure, If you have an idea...
>> I also had an "illegal instruction" exception when stepping
with F10
>> after break, but when I'm not breaking the code it runs well..
>>
>> A lot of mysteries there again...
>>
>> Visual studio displays well the disassembly with the debug lines at the
>> right place, etc .. so I don't get why visual studio crashes..
>> Another issue I have is that I always have to remove/add my breakpoint
so
>> that visual studio realy breaks, even if it says "I'm a good
breakpoint at
>> that good address". Does it have a relation with file checksums ?
It seems
>> mine has a "none" checksum so I suspect this to be the
problem.. but I
>> don't know how to fix it because I added the checksum with
addChecksum with
>> the good file name and still I get "none" in the dump...
>> So right know I'm quite hopeful because I get something reacting in
>> Visual studio, but I have no idea why it crashes..
>>
>> Have you already encountered this issue when testing your generated
pdbs
>> ?
>> Do you know the role of Section Contributions in the PDB/debugging
>> session ?
>> Any tip for checking Symbol record validity in the dump ? looks good to
>> me, no ??? anywhere or Error ..
>>
>> Thank you !
>>
>>
>>
>>
>>
>> Le mer. 23 janv. 2019 à 22:29, Zachary Turner <zturner at
google.com> a
>> écrit :
>>
>>> .text is where code goes, I don't know why it's called
.text, it's just
>>> been that way for many decades and the name stuck around.  But
actually you
>>> can call the section whatever you want.  Maybe it's even better
to call it
>>> something other than .text, because .text is where your DllMain and
other
>>> stuff will be.  You could call it .jit if you wanted to.  You
should be
>>> able to create the section with whatever flags you want to. 
You'll need to
>>> produce a jit_code.obj probably compiled from assembly that makes a
section
>>> named .jit and sets the flags to be executable (you can just copy
the flags
>>> from a normal .text section of some other program).  Then link this
file
>>> together along with a jitted_code_main.obj which you compiled from
a simple
>>> source file with a DllMain function that does nothing.  This would
make
>>> jitted_code.dll, then have your program link against
jitted_code.lib.
>>>
>>> Right now you jit the code into some buffer that you created with
>>> VirtualAlloc.  If you do the above, it will load jitted_code.dll
into
>>> memory and the OS loader will allocate some memory for each
section.  So
>>> this would be like your VirtualAlloc, you can just find the address
of the
>>> .jit section and use that buffer instead of the VirtualAlloc buffer
as the
>>> target address of your jit operations.
>>>
>>> Again, this is just an idea, no promises it will work, but
unfortunately
>>> that's kind of the best you can do when dealing with closed
source things,
>>> just make guesses and hope for the best.
>>>
>>>
>>>
>>> On Wed, Jan 23, 2019 at 12:42 PM Vivien Millet <vivien.millet at
gmail.com>
>>> wrote:
>>>
>>>> (Yes you are right this is my fault)
>>>> Considering the string table, it only seems to contains file
relative
>>>> informations in every pdb I am using, and it looks correct but
I will check
>>>> it.
>>>> I looked at the pdb.cpp code about checksums and tables, I
copied some
>>>> stuff and got things wrong according to cvdump, then I
simplified the
>>>> process of copying the table and it worked (in cvdump it finds
the file
>>>> matching line etc...) so I suspect this is also correct.
>>>>
>>>> All the streams look good, but I will check deeper !
>>>>
>>>> It seems right what you say about rva and modules, this is what
I m
>>>> afraid of, doing all of this for nothing or almost..
>>>>
>>>> Your idea looks good concerning the .text section in a
separated dll,
>>>> but will it be executable memory ? .text is where static
strings go right ?
>>>> When you say putting my jit in there, do you mean writing it
when the
>>>> jitted_code.dll is loaded in memory or on the .dll file
directly before
>>>> loading it ? In the first scenario I wonder if the section will
be
>>>> executable, in the second scenario I can’t do it because it
would require
>>>> perfect linking with the other code my jit points to..
>>>>
>>>> Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at
google.com> a
>>>> écrit :
>>>>
>>>>> (BTW, I'm adding llvm-dev back to the list, I
didn't notice it got
>>>>> taken off.  In general I try to keep the list on all
emails, even if it's
>>>>> extremely technical and specific, because someday someone
else will try to
>>>>> do this, and it'll be nice if they can read the whole
thread).
>>>>>
>>>>> I can think of a couple of things that might be wrong:
>>>>>
>>>>> 1) If the string table is in a different order, then
anything that
>>>>> refers to the string table need to be changed to refer to
the new offset.
>>>>> If the string "foo" is at offset 12 in the old
PDB, but offset 15 in the
>>>>> new PDB, then somewhere there is a record which is going to
look at offset
>>>>> 12 and expect to find something, and that will mess up. 
The main place
>>>>> this is important is in the File Checksums table, there is
an entry that
>>>>> says which file it is a checksum for, and that refers to
the string table.
>>>>> However, it's possible for certain symbol records to
refer to the string
>>>>> table too.  See lld/COFF/PDB.cpp and Ctrl+F for
"PDBStrTab" and you will
>>>>> find some information about this.
>>>>>
>>>>> 2) When you run `llvm-pdbutil dump -streams` on the copied
PDB, do all
>>>>> of them show a reasonable description?  Are there any
streams that say
>>>>> (???)?  If so, that's a problem.
>>>>>
>>>>> > does visual studio will consider a symbol file broken
if the
>>>>> address goes beyond the official module address range (the
compiled one),
>>>>> because my JIT code is allocated after the end of the
module with
>>>>> VirtualAlloc
>>>>> That is a good question, and part of why my job is so
difficult,
>>>>> because I can't look at their code.  But I think the
answer is "probably".
>>>>> The debugger has to have some way to convert an address in
your running
>>>>> process into a symbol and offset, because that's how
all debug info is
>>>>> represented in the PDB.  So if there is no module, then
there is no RVA
>>>>> (because the R in RVA means relative, and what would it be
relative to?).
>>>>>
>>>>> One idea to test this would be to create a DLL called
jitted_code.dll,
>>>>> give it a huuuuuge .text section (probably just a .asm file
and use some
>>>>> assembly directives to allocate a very large series of null
bytes), and
>>>>> then write your jit code into that area.  This way you
would not need to
>>>>> modify the existing PDB you would only need to make a new
PDB called
>>>>> jitted_code.pdb with 1 module, and those symbols could have
meaningful
>>>>> RVAs.  And you might not even need to detach the debugger
if you do things
>>>>> this way, because you could just right click the
jitted_code.dll module in
>>>>> the modules window and choose Load Symbols.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet <
>>>>> vivien.millet at gmail.com> wrote:
>>>>>
>>>>>> Yes this is it, I just make a copy from a pdb generated
by link.exe
>>>>>> (the microsoft one).
>>>>>> Using llvm-pdbutil to compare is what I do, except I do
it with "-all"
>>>>>> And I get almost everything the same : same number of
streams,
>>>>>> section map looks good,string table looks good (even if
not the same
>>>>>> order), same number of modules with the symbols and
subsection practically
>>>>>> the same, and this is why I get stuck, I miss something
but I can't see
>>>>>> what because I don't know where to look for. Visual
studio works with it, I
>>>>>> can debug my original exe, but probably without the
globals...
>>>>>> And the other problem is that the difference between
the dumps is not
>>>>>> necessarily a bug because the builder may generate new
hashes values,
>>>>>> reorder streams, modules, etc ...
>>>>>>
>>>>>> Right now I gave up to have publics and globals streams
and attacked
>>>>>> the real goal : insert my jit codeview into the pdb. I
have again done «
>>>>>> something » but as I don’t understand how the format
work I don’t have it
>>>>>> working in visual studio.. except once, a single time
it worked and the
>>>>>> breakpoint turned on in the UI (even if the rva was
broken for the
>>>>>> instructions) but it happened a single time .. then I
get depressed the
>>>>>> next times..... cvdump displays it all « correct », no
corrupt stuff
>>>>>> apparently. But what I do is probably wrong somewhere.
What I do is I take
>>>>>> .debug$S and .debug$T as is without relocations just to
see, but what I
>>>>>> don’t know really is : does visual studio will consider
a symbol file
>>>>>> broken if the address goes beyond the official module
address range (the
>>>>>> compiled one), because my JIT code is allocated after
the end of the module
>>>>>> with VirtualAlloc.
>>>>>> Another thing I don’t get is the section contribution,
what is it
>>>>>> exactly ? I inserted section contrib for all sections
except the debug$
>>>>>> ones but I don’t know what i’m really doing and it’s my
average problem
>>>>>> implementing this JIT feature...
>>>>>> I also don’t know what are relocations inside the
codeview format,
>>>>>> what is the difference between RVA and relocation, is
there anything to do
>>>>>> with this related to the codeview part I need to insert
in the pdb ? I
>>>>>> don’t see why visual studio needs more than just
RVA<->Line mapping..
>>>>>> This is really making me crazy being so ignorant and
trying to guess
>>>>>> what visual studio does...
>>>>>>
>>>>>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner
<zturner at google.com> a
>>>>>> écrit :
>>>>>>
>>>>>>> So if i understand correctly, you're basically
just trying to
>>>>>>> implement something like a pdb *copy*, just as a
test to see if you can get
>>>>>>> it to work.  So you generate a PDB with cl/link or
clang-cl/lld-link, then
>>>>>>> try to copy it using your tool, then see if it
still works.
>>>>>>>
>>>>>>> If this is correct, and it's not working, then
there is probably
>>>>>>> just something you didn't copy.  Neither
Publics nor globals actually
>>>>>>> contain their own data, instead they just refer to
records from the
>>>>>>> corresponding module stream.  So an S_PROCREF for
the function "main" might
>>>>>>> have fields that say "the name of the function
is main, and it's at offset
>>>>>>> 20 of module 1".  So, if there is no module 1,
or if offset 20 of module is
>>>>>>> not actually an S_GPROC32 for the function main,
then it will be broken.
>>>>>>>
>>>>>>> Did you also go through each module in the source
PDB, add a new
>>>>>>> module in the target PDB, then copy all of the
symbols for each one?
>>>>>>>
>>>>>>> the best way to find differences is by using
llvm-pdbutil on the
>>>>>>> source and target PDBs and looking for things that
look different.  For
>>>>>>> example, I'd start with llvm-pdbutil dump
-streams and then seeing if they
>>>>>>> even have all the same streams.  If one of them is
missing streams, that's
>>>>>>> a good place to start.  If they have the same
streams, then look for ones
>>>>>>> where the size is different.  Then drill into those
to see why the size is
>>>>>>> different.
>>>>>>>
>>>>>>> LMK if that helps.
>>>>>>>
>>>>>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <
>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>
>>>>>>>> For now I'm not merging my JIT CodeView
section, I only try to
>>>>>>>> build a pure copy of an existing PDB using the
XxxBuilder classes
>>>>>>>> (PDBFileBuilder & Co / reading a PDBFile)
and check if visual studio wants
>>>>>>>> to eat it..
>>>>>>>> For Publics and Globals, what I do is naive, I
use the
>>>>>>>> GsiStreamBuilder and prey :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   if (File.hasPDBGlobalsStream() &&
File.getPDBGlobalsStream()) {
>>>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>>>     GlobalsStream &stream =
*File.getPDBGlobalsStream();
>>>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>>>
>>>>>>>>     for (uint32_t PubSymOff :
stream.getGlobalsTable()) {
>>>>>>>>       CVSymbol Sym =
SymbolRecords.readRecord(PubSymOff);
>>>>>>>>       builder.addGlobalSymbol(Sym);
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>>   if (File.hasPDBPublicsStream() &&
File.getPDBPublicsStream()) {
>>>>>>>>     GSIStreamBuilder &builder =
this->getGsiBuilder();
>>>>>>>>     PublicsStream &stream =
*File.getPDBPublicsStream();
>>>>>>>>     SymbolStream &SymbolRecords =
cantFail(File.getPDBSymbolStream());
>>>>>>>>
>>>>>>>> 	std::vector<PublicSym32> Publics;
>>>>>>>>
>>>>>>>>     for (uint32_t PubSymOff :
stream.getPublicsTable()) {
>>>>>>>>       PublicSym32 Pub = cantFail(
>>>>>>>>          
llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>>>>>              
SymbolRecords.readRecord(PubSymOff)));
>>>>>>>>       Publics.push_back(Pub);
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     if (!Publics.empty()) {
>>>>>>>>       // Sort the public symbols and add them
to the stream.
>>>>>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>>>>>            [](const PublicSym32 &L, const
PublicSym32 &R) {
>>>>>>>>              return L.Name < R.Name;
>>>>>>>>            });
>>>>>>>>       for (const PublicSym32 &Pub :
Publics)
>>>>>>>>         builder.addPublicSymbol(Pub);
>>>>>>>>     }
>>>>>>>>
>>>>>>>>   }
>>>>>>>>
>>>>>>>> Is it what you meant ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner
<zturner at google.com>
>>>>>>>> a écrit :
>>>>>>>>
>>>>>>>>> Also, even if symbolGoesInGlobalsStream
returns true, you can’t
>>>>>>>>> just copy it. Functions, for example, which
are S_GPROC32 or S_LPROC32 in
>>>>>>>>> the module stream, are S_PROCREF in the
globals stream. Similarly,
>>>>>>>>> *everything* in the publics stream is
S_PUB32. So you need to convert each
>>>>>>>>> symbol to the proper type for the stream
it’s going to go in
>>>>>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary
Turner <zturner at google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Publics are basically a list of
everything that has a mangled
>>>>>>>>>> name. To be honest, I don’t know what
the debugger uses this for.
>>>>>>>>>>
>>>>>>>>>> Globals is essentially every symbol in
the pdb in one large
>>>>>>>>>> table. The reason this is important is
because if you type “foo” in the
>>>>>>>>>> watch window, the debugger doesn’t
necessarily know what compiland foo
>>>>>>>>>> comes from. So it has to have a way to
find everything in the entire
>>>>>>>>>> program no matter what compiland it
came from. That’s what the globals are.
>>>>>>>>>>
>>>>>>>>>> Both publics and globals are hash
tables, so one possible reason
>>>>>>>>>> there might be a problem is that you
need to rehash the entire table. When
>>>>>>>>>> you build your modified pdb, I would
suggest starting with an empty publics
>>>>>>>>>> / globals stream, adding all items from
the first pdb by iterating over
>>>>>>>>>> those records and using a
GlobalsStreamBuilder, then adding all your jitted
>>>>>>>>>> items separately, then writing it out.
That should make sure it gets hashed
>>>>>>>>>> correctly.
>>>>>>>>>>
>>>>>>>>>> Are you doing that?
>>>>>>>>>>
>>>>>>>>>> Btw, not all symbols belong in the
globals / publics stream.
>>>>>>>>>> Check the code in lld and search for
symbolGoesInGlobalsStream and
>>>>>>>>>> symbolGoesInPublicsStream to see the
logic it uses
>>>>>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien
Millet <
>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Zachary, sorry for disturbing
again..
>>>>>>>>>>>
>>>>>>>>>>> I've fixed some problems
(StringTable, SectionMap and few things
>>>>>>>>>>> here and there..) and my converted
PDB seems now to work inside visual
>>>>>>>>>>> studio..
>>>>>>>>>>> But I'm not sure if I have full
debug features because I don't
>>>>>>>>>>> succeed to translate Publics and
Globals correctly. CVDump says PDB is
>>>>>>>>>>> corrupted whereas PDBUTIL -dump
correctly displays them.
>>>>>>>>>>> I don't really understand what
Publics and Globals stream really
>>>>>>>>>>> are, if the symbols are really in
the corresponding streams or if they are
>>>>>>>>>>> just references to somewhere else.
>>>>>>>>>>> The LLVM documentation is not
complete about these two Publics
>>>>>>>>>>> and Globals stream so I'm a bit
lost on how to handle them or find what is
>>>>>>>>>>> "corrupted" according to
CVDump.
>>>>>>>>>>> I took example on LLD and yaml2pdb
to help me to do some tough
>>>>>>>>>>> conversions but I noticed that in
yaml2pdb there is no GsiStream exported
>>>>>>>>>>> (no GsiBuidler use and no reference
to Publics or Globals anywhere), is it
>>>>>>>>>>> wanted/correct ?
>>>>>>>>>>> Thanks and sorry If I'm a bit
spaming, it's my 99% time task
>>>>>>>>>>> right now and being stuck without
any clue is difficult :) But I guess you
>>>>>>>>>>> experienced even more suffering
when documentation didn't exist at all !
>>>>>>>>>>> Have a good day !
>>>>>>>>>>>
>>>>>>>>>>> Le dim. 20 janv. 2019 à 22:27,
Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> a
écrit :
>>>>>>>>>>>
>>>>>>>>>>>> ERRATUM, my bad, the pdb I
tested is also corrupted according
>>>>>>>>>>>> to cvdump.exe, I on't know
why, I regenerated again and now I have a
>>>>>>>>>>>> working dump. You don't
need to fix anything.
>>>>>>>>>>>>
>>>>>>>>>>>> Le dim. 20 janv. 2019 à 20:26,
Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com>
a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Zachary,
>>>>>>>>>>>>> I've done a first step
to rewrite  existing PDBFile with
>>>>>>>>>>>>> PDBFileBuilder, I get
mostly of the work done but I don't get as much
>>>>>>>>>>>>> output as input (some
streams are not mirrored for unknown reasons and some
>>>>>>>>>>>>> data must be missing here
and there...).
>>>>>>>>>>>>> When I try to replace the
original by the rebuilt one for
>>>>>>>>>>>>> debugging, the pdb loads
well but breakpoints failed to activate with a
>>>>>>>>>>>>> "unexpected symbol
reader error while processing foobar.exe". You probably
>>>>>>>>>>>>> know what it means or
already encountered this error I guess.
>>>>>>>>>>>>> I also tried to create a
minimal program to simplify
>>>>>>>>>>>>> comparisons between
original and new PDB but I get an error dumping the
>>>>>>>>>>>>> original  pdb exported by
visual studio  with -all (PublicsStream.cpp|98).
>>>>>>>>>>>>> I think it is a bug.
>>>>>>>>>>>>> I've attached the
related main.cpp and PDB to this email if
>>>>>>>>>>>>> you want to check what is
the error exactly (vs2017, x86 and x64 have same
>>>>>>>>>>>>> issues).
>>>>>>>>>>>>> I've attached also my
code (git diff). I added an « identity »
>>>>>>>>>>>>> feature to pdbutil which
uses the code I wrote to regenerate the input pdb.
>>>>>>>>>>>>> You can use it to see what
I get so far..
>>>>>>>>>>>>> I’ve seen you added
recently a fix related to FPO but you say
>>>>>>>>>>>>> it’s only for x86 so I
don’t think it would change something but who knows..
>>>>>>>>>>>>> Anyway, if you have a
moment to check my work so far and give
>>>>>>>>>>>>> me feedbacks it’s welcome
because I get out of ideas about what goes wrong..
>>>>>>>>>>>>> Thanks, I go back digging
into the pdb mysteries !
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le ven. 18 janv. 2019 à
12:31, Vivien Millet <
>>>>>>>>>>>>> vivien.millet at
gmail.com> a écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok ! It was just to be
sure I understood well.
>>>>>>>>>>>>>> Sorry for not replying
directly, I wanted to try first to
>>>>>>>>>>>>>> emit CodeView before
continuing the discussion and it was time for me to go
>>>>>>>>>>>>>> to bed here..
>>>>>>>>>>>>>> I just tried it now and
it is very easy to switch to
>>>>>>>>>>>>>> CodeView. For the ones
interested : you just have to give your TargetTriple
>>>>>>>>>>>>>> to your llvm::Module
used for JIT and then call
>>>>>>>>>>>>>>
module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell
the
>>>>>>>>>>>>>> AsmPrinter this module
prefer CodeView instead of Dwarf.
>>>>>>>>>>>>>> I've checked the
content of my .obj file, and there is valid
>>>>>>>>>>>>>> .debug$T and  .debug$S
sections, so everything goes well until now.
>>>>>>>>>>>>>> Now as a parallel task
I will try to read the EXE PDB and
>>>>>>>>>>>>>> re-export it "as
it" to see if I break something in visual studio.
>>>>>>>>>>>>>> If I succeed to do
that, that might be added as a feature to
>>>>>>>>>>>>>> PDBFile or
PDBFileBuilder to simplify the process for other users.
>>>>>>>>>>>>>> I keep you in touch.
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à
20:50, Zachary Turner <
>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When I say
"nothing to do" I just mean that you won't have
>>>>>>>>>>>>>>> to do anything to
convert the record from one format (DWARF) to another
>>>>>>>>>>>>>>> format (CodeView). 
You will have a COFF object file either on disk
>>>>>>>>>>>>>>> (probably named
foo.obj or something) or in memory.  And this object file
>>>>>>>>>>>>>>> will have a
.debug$S section with CodeView symbols and a .debug$T section
>>>>>>>>>>>>>>> with CodeView
types.  Then you will still need to use the PDBFileBuilder to
>>>>>>>>>>>>>>> add these records
to the final PDB, but they will already be in the correct
>>>>>>>>>>>>>>> format that
PDBFileBuilder expects, you won't need to convert them from
>>>>>>>>>>>>>>> DWARF (which is not
trivial).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 17,
2019 at 11:26 AM Vivien Millet <
>>>>>>>>>>>>>>> vivien.millet at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That’s a good
question, by default when emitting the object
>>>>>>>>>>>>>>>> file I choose
COFF but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>>>>>> probably is a
way to do it or at least it must be implemented if not yet..
>>>>>>>>>>>>>>>> Lets imagine I
manage to do that.. when you say there is
>>>>>>>>>>>>>>>> nothing to do,
I still must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>>>>>> inside the EXE
PDB right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>>>>>> another way ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le jeu. 17
janv. 2019 à 20:01, Zachary Turner <
>>>>>>>>>>>>>>>> zturner at
google.com> a écrit :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, is it
possible to just hook up the CodeView debug
>>>>>>>>>>>>>>>>> info
generator to MCJIT?  If you're not jitting, and you just compile
>>>>>>>>>>>>>>>>> something,
we translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>>>>>>>
CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>>>>>>> If it's
not hard to do, this would probably be a better solution, because
>>>>>>>>>>>>>>>>> you
don't have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>>>>>> is not
always trivial.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you can
configure this in MCJIT, you won't even need to
>>>>>>>>>>>>>>>>> do
anything, you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>>>>>> .debug$S
sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>>>>>> while
copying them to the output PDB file.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan
17, 2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ok I
understand more what you meant. In fact I don’t care
>>>>>>>>>>>>>>>>>> about
the pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>>>>>> me to
have duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>>>>>> is not
to generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>>>>>> extract
debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>>>>>> is
emitted by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>>>>>>
PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>>>>>> I think
that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>>>>>> MCJit
if not being an extension to pdbutil.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Le jeu.
17 janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>>>>>> zturner
at google.com> a écrit :
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Well, for example the TPI stream is just one big
>>>>>>>>>>>>>>>>>>>
collection of types.  Presumably your JIT code will reuse some of the same
>>>>>>>>>>>>>>>>>>>
types (perhaps, std::string for example) as your non-jitted code.  Your
>>>>>>>>>>>>>>>>>>>
jitted symbol records in the object file (for example, a local variable of
>>>>>>>>>>>>>>>>>>>
type std::string in your jitted code) will refer to the type for
>>>>>>>>>>>>>>>>>>>
std;:string by a TypeIndex, and your original PDB will also refer to
>>>>>>>>>>>>>>>>>>>
std::string by a different TypeIndex.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In
LLD, when we merge in types and symbols from each
>>>>>>>>>>>>>>>>>>>
object file, we keep a hash table of which types have already been seen, so
>>>>>>>>>>>>>>>>>>>
that if we see the same type again, we can just use the TypeIndex that we
>>>>>>>>>>>>>>>>>>>
wrote on a previous object file.  Then, when we add symbol records, we have
>>>>>>>>>>>>>>>>>>> to
update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>>>>>>
instead.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
De-duplicating though, I suppose, is not strictly
>>>>>>>>>>>>>>>>>>>
necessary, it will just keep your PDB size down.  But you *will* need to at
>>>>>>>>>>>>>>>>>>>
least re-write the TypeIndexes from the jitted code.  For example, you may
>>>>>>>>>>>>>>>>>>>
decide that instead of de-duplicating, you just append them all to the end
>>>>>>>>>>>>>>>>>>> of
the TPI stream (where all the types go in PDB) to keep things simple.
>>>>>>>>>>>>>>>>>>>
Since they were in a different position before, they now have different
>>>>>>>>>>>>>>>>>>>
TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>>>>>>
correct after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>>>>>> you
will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>>>>>> the
symbols of the jitted code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let
me know if that makes sense.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Ok I see..
>>>>>>>>>>>>>>>>>>>>
what do you mean by “making sure to de-duplicate
>>>>>>>>>>>>>>>>>>>>
records as necessary” ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
It's possible in theory to support incremental updates
>>>>>>>>>>>>>>>>>>>>>
to a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>>>>>>
this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>>>>>>
support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>>>>>>
build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>>>>>>
de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>>>>>>
most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>>>>>>
writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>>>>>>
advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Hi Zachary !
>>>>>>>>>>>>>>>>>>>>>>
If there a way to easily create a new PDBFileBuilder
>>>>>>>>>>>>>>>>>>>>>>
from an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>>>>>>
I would like to start from a builder filled with the
>>>>>>>>>>>>>>>>>>>>>>
EXE PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Thanks !
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thank you Zachary !
>>>>>>>>>>>>>>>>>>>>>>>
I will have some soon I think ..
>>>>>>>>>>>>>>>>>>>>>>>
I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>>>>>>
because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Sure. Along the way I’m happy to answer any
>>>>>>>>>>>>>>>>>>>>>>>>
specific questions you might have too even if it’s for your downstream
>>>>>>>>>>>>>>>>>>>>>>>>
project
>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I would be up to improve pdbutil but I doubt I
>>>>>>>>>>>>>>>>>>>>>>>>>
have enough knowledge or time to provide the complete merge feature, it
>>>>>>>>>>>>>>>>>>>>>>>>>
would still be a very specific kind of merge as you describe it. Anyway I
>>>>>>>>>>>>>>>>>>>>>>>>>
could start trying to do it in my jit compiler and then, once I get
>>>>>>>>>>>>>>>>>>>>>>>>>
something working (if that happens :)), i can come back to you with the
>>>>>>>>>>>>>>>>>>>>>>>>>
piece of code and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>>>>>>
will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>>>>>>
writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>>>>>>
When you talk about doing all of this I suppose
>>>>>>>>>>>>>>>>>>>>>>>>>>>
you think about using llvm/debuginfo/pdb, pick code here and there to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
generate the pdb in memory, read the executable one and perform the merge
>>>>>>>>>>>>>>>>>>>>>>>>>>>
directly in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
So you are one of the happy guys who suffered
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
from the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Merge my JIT pdb into a copy of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- On each JIT rebuild, restart these steps
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
from the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
So, here are the things I think you would need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
to do:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
1) Create a JIT module in the module list with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
a unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
You're kind of in uncharted territory here, so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
this is just a rough idea of what needs to be done.  There may be other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
issues that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unfortunately I don't personally have the time
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
to work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
questions or problems along the way.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190129/aec5eb1a/attachment.html>

llvm dev - Jan 2019 - [llvm-pdbutil] : merge not working properly

[llvm-dev] [llvm-pdbutil] : merge not working properly

[llvm-dev] [llvm-pdbutil] : merge not working properly

[llvm-dev] [llvm-pdbutil] : merge not working properly

[llvm-dev] [llvm-pdbutil] : merge not working properly