thr3ads.net - llvm dev - [llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler) [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Leonardo Santagada via llvm-dev

2018-Jan-25 17:49 UTC

[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

I did reorder my sections, so that .debug$H is in the correct place, but
now I get some errors on dubplicate symbols, I created a folder with
examples:

https://www.dropbox.com/sh/nmvzi44pi0boe76/AAA0f47O5PCJ9JiUc6wVuwBra?dl=0

t.obj is generated by vs 2015 and it links fine with lld-link.exe, but
tout.obj gives this errors:

lld-link.exe /DEBUG:GHASH tout.obj
LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
tout.obj and in LIBCMT.lib(default_local_stdio_options.obj)
LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
tout.obj and in libvcruntime.lib(undname.obj)

I'm using PEView from http://wjradburn.com/software/ to look at the files
and can't see anything wrong, except some valid differences in the offsets
being used for the data (so pointer to data is different between them).

I will look into yaml2obj now to see if I see anything else weird going on.


On Thu, Jan 25, 2018 at 6:41 PM, Zachary Turner <zturner at google.com>
wrote:
> I'm pretty confident that cl is not putting anything strange in the
> .debug$T sections.  We've done a lot of testing and never seen anything
> except CodeView type records in a .debug$T.  My hunch is that your objcopy
> patch is probably not doing the right thing in one or more of the section
> headers, and this is confusing the linker.
>
> One idea might be to build a simple object file with clang-cl but without
> the magic -mllvm -emit-codeview-ghash-section, then run your llvm-objcopy
> on it.  Then build the same object file passing -mllvm
> -emit-codeview-ghash-section.  Then run obj2yaml on both and diff the
> results.  They should be byte-for-byte identical.  That should give you a
> clue about if objcopy is doing something wrong.
>
> On Thu, Jan 25, 2018 at 2:21 AM Leonardo Santagada <santagada at
gmail.com>
> wrote:
>
>> Don't worry, I definetly want to perfect this to generate legal obj
>> files, this is just to speed up testing.
>>
>> Now after patching all the obj files I get this errors when linking a
>> small part of our code base (msvc 2017 15.5.3, lld and llvm-objcopy
7.0.0):
>> lld-link.exe : error : relocation against symbol in discarded section:
>> $LN8
>> lld-link.exe : error : relocation against symbol in discarded section:
>> $LN43
>> lld-link.exe : error : relocation against symbol in discarded section:
>> $LN37
>>
>> I'm starting to guess that cl.exe might be putting some random
comdat or
>> other discardable symbols in the .debug$T and clang doesn't? I will
try to
>> debug this and see what more I can uncover.
>>
>> Linking works perfectly without my llvm-objcopy pass to add .debug$H?
>>
>>
>> On Thu, Jan 25, 2018 at 1:53 AM, Zachary Turner <zturner at
google.com>
>> wrote:
>>
>>> It might not influence LLD, but at the same time we don't want
to
>>> upstream something that is producing technically illegal COFF
files.  Also
>>> good to hear about the planned changes to your header files. 
Looking
>>> forward to hearing about your experiences with clang-cl.
>>>
>>> On Wed, Jan 24, 2018 at 10:41 AM Leonardo Santagada <santagada
at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I finally got my first .obj file patched with .debug$H to look
somewhat
>>>> right. I added the new section at the end of the file so I
don't have to
>>>> recalculate all sections (although now I probably could
position it in the
>>>> middle, knowing that each section is: SizeOfRawData +
(last.Header.NumberOfRelocations
>>>> * (4+4+2)) and the $H needs to come right after $T in the
file). That
>>>> although illegal based on the coff specs doesn't seem its
going to
>>>> influence lld.
>>>>
>>>> Also we talked and we are probably going to do something
similar to a
>>>> bunch of windows defines and a check for our own define (to
guarantee that
>>>> no one imported windows.h before win32.h) and drop the
namespace and the
>>>> conflicting names.
>>>>
>>>>
>>>> On Tue, Jan 23, 2018 at 12:46 AM, Zachary Turner <zturner at
google.com>
>>>> wrote:
>>>>
>>>>> That's very possible that a 3rd party indirect header
include is
>>>>> involved.  One idea might be like I suggested where you
#define _WINDOWS_
>>>>> in win32.h and guarantee that it's always included
first.  Then those other
>>>>> headers won't be able to #include <windows.h>. 
but it will probably
>>>>> greatly expand the amount of stuff you have to add to
win32.h, as you will
>>>>> probably find some callers of functions that aren't yet
in your win32.h
>>>>> that you'd have to add.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 22, 2018 at 3:28 PM Leonardo Santagada <
>>>>> santagada at gmail.com> wrote:
>>>>>
>>>>>> Ok some information was lost on getting this example to
you, I'm
>>>>>> sorry for not being clear.
>>>>>>
>>>>>> We have a huge code base, let's say 90% of it
doesn't include either
>>>>>> header, 9% include win32.h and 1% includes both, I will
try to discover
>>>>>> why, but my guess is they include both a third party
that includes
>>>>>> windows.h and some of our libs that use win32.h.
>>>>>>
>>>>>> I will try to fully understand this tomorrow.
>>>>>>
>>>>>> I guess clang will not implement this ever so finishing
the object
>>>>>> copier is the best solution until all code is ported to
clang.
>>>>>>
>>>>>> On 23 Jan 2018 00:02, "Zachary Turner"
<zturner at google.com> wrote:
>>>>>>
>>>>>>> You said win32.h doesn't include windows.h, but
main.cpp does.  So
>>>>>>> what's the disadvantage of just including it in
win32.h anyway, since it's
>>>>>>> already going to be in every translation unit? 
(Unless you didn't mean to
>>>>>>> #include it in main.cpp)
>>>>>>>
>>>>>>>
>>>>>>> I guess all I can do is warn you how bad of an idea
this is.  For
>>>>>>> starters, I already found a bug in your code ;-)
>>>>>>>
>>>>>>> // stdint.h
>>>>>>> typedef int                int32_t;
>>>>>>>
>>>>>>> // winnt.h
>>>>>>> typedef long LONG;
>>>>>>>
>>>>>>> // windef.h
>>>>>>> typedef struct tagPOINT
>>>>>>> {
>>>>>>>     LONG  x;   // long x
>>>>>>>     LONG  y;   // long y
>>>>>>> } POINT, *PPOINT, NEAR *NPPOINT, FAR *LPPOINT;
>>>>>>>
>>>>>>> // win32.h
>>>>>>> typedef int32_t LONG;
>>>>>>>
>>>>>>> struct POINT
>>>>>>> {
>>>>>>> LONG x;   // int x
>>>>>>> LONG y;   // int y
>>>>>>> };
>>>>>>>
>>>>>>> So POINT is defined two different ways.  In your
minimal interface,
>>>>>>> it's declared as 2 int32's, which are int. 
In the actual Windows header
>>>>>>> files, it's declared as 2 longs.
>>>>>>>
>>>>>>> This might seem like a unimportant bug since int
and long are the
>>>>>>> same size, but int and long also mangle differently
and affect overload
>>>>>>> resolution, so you could have weird linker errors
or call the wrong
>>>>>>> function overload.
>>>>>>>
>>>>>>> Plus, it illustrates the fact that this struct
*actually is* a
>>>>>>> different type from the one in the windows header.
>>>>>>>
>>>>>>> You said at the end that you never intentionally
import win32.h and
>>>>>>> windows.h from the same translation unit.  But then
in this example you
>>>>>>> did.  I wonder if you could enforce that by doing
this:
>>>>>>>
>>>>>>> // win32.h
>>>>>>> #pragma once
>>>>>>>
>>>>>>> // Error if windows.h was included before us.
>>>>>>> #if defined(_WINDOWS_)
>>>>>>> #error "You're including win32.h after
having already included
>>>>>>> windows.h.  Don't do this!"
>>>>>>> #endif
>>>>>>>
>>>>>>> // And also make sure windows.h can't get
included after us
>>>>>>> #define _WINDOWS_
>>>>>>>
>>>>>>> For the record, I tried the test case you linked
when windows.h is
>>>>>>> not included in main.cpp and it works (but still
has the bug about int and
>>>>>>> long).
>>>>>>>
>>>>>>> On Mon, Jan 22, 2018 at 2:23 PM Leonardo Santagada
<
>>>>>>> santagada at gmail.com> wrote:
>>>>>>>
>>>>>>>> It is super gross, but we copy parts of
windows.h because having
>>>>>>>> all of it if both gigantic and very very messy.
So our win32.h has a couple
>>>>>>>> thousands of lines and not 30k+ for windows.h
and we try to have zero
>>>>>>>> macros. Win32.h doesn't include windows.h
so using ::BOOL wouldn't work. We
>>>>>>>> don't want to create a namespace, we just
want a cleaner interface to
>>>>>>>> windows api. The namespace with c linkage is
the way to trick cl into
>>>>>>>> allowing us to in some files have both
windows.h and Win32.h. I really
>>>>>>>> don't see any way for us to have this
Win32.h without this cl support, so
>>>>>>>> maybe we should either put windows.h in a
compiled header somewhere and not
>>>>>>>> care that it is infecting everything or just
have one place we can call to
>>>>>>>> clean up after including windows.h (a massive
set of undefs).
>>>>>>>>
>>>>>>>> So using can't work, because we never
intentionally import
>>>>>>>> windows.h and win32.h on the same translation
unit.
>>>>>>>>
>>>>>>>> On Mon, Jan 22, 2018 at 7:08 PM, Zachary Turner
<zturner at google.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> This is pretty gross, honestly :)
>>>>>>>>>
>>>>>>>>> Can't you just use using declarations?
>>>>>>>>>
>>>>>>>>> namespace Win32 {
>>>>>>>>> extern "C" {
>>>>>>>>>
>>>>>>>>> using ::BOOL;
>>>>>>>>> using ::LONG;
>>>>>>>>> using ::POINT;
>>>>>>>>> using ::LPPOINT;
>>>>>>>>>
>>>>>>>>> using ::GetCursorPos;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> This works with clang-cl.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 22, 2018 at 5:39 AM Leonardo
Santagada <
>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Here it is a minimal example, we do
this so we don't have to
>>>>>>>>>> import the whole windows api
everywhere.
>>>>>>>>>>
>>>>>>>>>>
https://gist.github.com/santagada/7977e929d31c629c4bf18ebb987f6b
>>>>>>>>>> e3
>>>>>>>>>>
>>>>>>>>>> On Sun, Jan 21, 2018 at 2:31 AM,
Zachary Turner <
>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Clang-cl maintains compatibility
with msvc even in cases where
>>>>>>>>>>> it’s non standards compliant (eg 2
phase name lookup), but we try to keep
>>>>>>>>>>> these cases few and far between.
>>>>>>>>>>>
>>>>>>>>>>> To help me understand your case, do
you mean you copy windows.h
>>>>>>>>>>> and modify it? How does this lead
to the same struct being defined twice?
>>>>>>>>>>> If i were to write this:
>>>>>>>>>>>
>>>>>>>>>>> struct Foo {};
>>>>>>>>>>> struct Foo {};
>>>>>>>>>>>
>>>>>>>>>>> Is this a small repro of the issue
you’re talking about?
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jan 20, 2018 at 3:44 PM
Leonardo Santagada <
>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I can totally see something
like incremental linking with a
>>>>>>>>>>>> simple padding between obj and
a mapping file (which can also help with
>>>>>>>>>>>> edit and continue, something we
also would love to have).
>>>>>>>>>>>>
>>>>>>>>>>>> We have another developer doing
the port to support clang-cl,
>>>>>>>>>>>> but although most of our code
also goes trough a version of clang,
>>>>>>>>>>>> migrating the rest to clang-cl
has been a fight. From what I heard the main
>>>>>>>>>>>> problem is that we have a copy
of parts of windows.h (so not to bring the
>>>>>>>>>>>> awful parts of it like lower
case macros) and that totally works on cl, but
>>>>>>>>>>>> clang (at least 6.0) complains
about two struct/vars with the same name,
>>>>>>>>>>>> even though they are exactly
the same. Making clang-cl as broken as cl.exe
>>>>>>>>>>>> is not an option I suppose? I
would love to turn on a flag
>>>>>>>>>>>>
--accept-that-cl-made-bad-decisions-and-live-with-it and have
>>>>>>>>>>>> this at least until this is
completely fixed in our code base.
>>>>>>>>>>>>
>>>>>>>>>>>> the biggest win with moving to
cl would be a better more
>>>>>>>>>>>> standards compliant compiler,
no 1 minute compiles on heavily templated
>>>>>>>>>>>> files and maybe the holy grail
of ThinLTO.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jan 20, 2018 at 10:56
PM, Zachary Turner <
>>>>>>>>>>>> zturner at google.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> 10-15s will be hard without
true incremental linking.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At some point that's
going to be the only way to get any
>>>>>>>>>>>>> faster, but incremental
linking is hard (putting it lightly), and since our
>>>>>>>>>>>>> full links are already
really fast we think we can get reasonably close to
>>>>>>>>>>>>> link.exe incremental speeds
with full links.  But it's never enough and I
>>>>>>>>>>>>> will always want it to be
faster, so you may see incremental linking in the
>>>>>>>>>>>>> future after we hit a
performance wall with full link speed :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> In any case, I'm
definitely interested in seeing what kind of
>>>>>>>>>>>>> numbers you get with
/debug:ghash after you get this llvm-objcopy feature
>>>>>>>>>>>>> implemented.  So keep me
updated :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> As an aside, have you tried
building with clang instead of
>>>>>>>>>>>>> cl?  If you build with
clang you wouldn't even have to do this llvm-objcopy
>>>>>>>>>>>>> work, because it would
"just work".  If you've tried but ran into issues
>>>>>>>>>>>>> I'm interested in
hearing about those too.  On the other hand, it's also
>>>>>>>>>>>>> reasonable to only switch
one thing at a time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Jan 20, 2018 at
1:34 PM Leonardo Santagada <
>>>>>>>>>>>>> santagada at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> if we get to < 30s I
think most users would prefer it to
>>>>>>>>>>>>>> link.exe, just hopping
there is still some more optimizations to get closer
>>>>>>>>>>>>>> to ELF linking times
(around 10-15s here).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at
9:50 PM, Zachary Turner <
>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Generally speaking
a good rule of thumb is that /debug:ghash
>>>>>>>>>>>>>>> will be close to or
faster than /debug:fastlink, but with none of the
>>>>>>>>>>>>>>> penalties like slow
debug time
>>>>>>>>>>>>>>> On Sat, Jan 20,
2018 at 12:44 PM Zachary Turner <
>>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Chrome is
actually one of my exact benchmark cases. When
>>>>>>>>>>>>>>>> building
blink_core.dll and browser_tests.exe, i get anywhere from a 20-40%
>>>>>>>>>>>>>>>> reduction in
link time. We have some other optimizations in the pipeline
>>>>>>>>>>>>>>>> but not
upstream yet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My best time so
far (including other optimizations not yet
>>>>>>>>>>>>>>>> upstream) is
28s on blink_core.dll, compared to 110s with /debug
>>>>>>>>>>>>>>>> On Sat, Jan 20,
2018 at 12:28 PM Leonardo Santagada <
>>>>>>>>>>>>>>>> santagada at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Jan
20, 2018 at 9:05 PM, Zachary Turner <
>>>>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You
probably don't want to go down the same route that
>>>>>>>>>>>>>>>>>> clang
goes through to write the object file.  If you think yaml2coff is
>>>>>>>>>>>>>>>>>>
convoluted, the way clang does it will just give you a headache.  There are
>>>>>>>>>>>>>>>>>>
multiple abstractions involved to account for different object file formats
>>>>>>>>>>>>>>>>>> (ELF,
COFF, MachO) and output formats (Assembly, binary file).  At least
>>>>>>>>>>>>>>>>>> with
yaml2coff
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think
your phrase got cut there, but yeah I just found
>>>>>>>>>>>>>>>>>
AsmPrinter.cpp and it is convoluted.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
It's true that yaml2coff is using the COFFParser
>>>>>>>>>>>>>>>>>>
structure, but if you look at the writeCOFF function in
>>>>>>>>>>>>>>>>>>
yaml2coff it's pretty bare-metal.  The logic you need will be almost
>>>>>>>>>>>>>>>>>>
identical, except that instead of checking the COFFParser for the various
>>>>>>>>>>>>>>>>>> fields,
you'll check the existing COFFObjectFile, which should have similar
>>>>>>>>>>>>>>>>>> fields.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
only thing you need to different is when writing the
>>>>>>>>>>>>>>>>>> section
table and section contents, to insert a new entry.  Since
>>>>>>>>>>>>>>>>>>
you're injecting a section into the middle, you'll also probably need to
>>>>>>>>>>>>>>>>>> push
back the file pointer of all subsequent sections so that they don't
>>>>>>>>>>>>>>>>>>
overlap.  (e.g. if the original sections are 1, 2, 3, 4, 5 and you insert
>>>>>>>>>>>>>>>>>> between
2 and 3, then the original sections 3, 4, and 5 would need to have
>>>>>>>>>>>>>>>>>> their
FilePointerToRawData offset by the size of the new section).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the
PE/COFF spec open here and I'm happy that I
>>>>>>>>>>>>>>>>> read a bit
of it so I actually know what you are talking about... yeah it
>>>>>>>>>>>>>>>>> doesn't
seem too complicated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you
need to know what values to put for the other
>>>>>>>>>>>>>>>>>> fields
in a section header, run `dumpbin /headers foo.obj` on a
>>>>>>>>>>>>>>>>>>
clang-generated object file that has a .debug$H section already (e.g. run
>>>>>>>>>>>>>>>>>> clang
with -emit-codeview-ghash-section, and look at the properties of the
>>>>>>>>>>>>>>>>>>
.debug$H section and use the same values).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks I
will do that and then also look at how the
>>>>>>>>>>>>>>>>> CodeView
part of the code does it if I can't understand some of it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The
only invariant that needs to be maintained is that
>>>>>>>>>>>>>>>>>>
Section[N]->FilePointerOfRawData == Section[N-1]->FilePointerOfRawData +
>>>>>>>>>>>>>>>>>>
Section[N-1]->SizeOfRawData
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, that
and all the sections need to be on the final
>>>>>>>>>>>>>>>>> file... But
I'm hopeful.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Anyone has
times on linking a big project like chrome with
>>>>>>>>>>>>>>>>> this so
that at least I know what kind of performance to expect?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My numbers
are something like:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 pdb per
obj file: link.exe takes ~15 minutes and 16GB of
>>>>>>>>>>>>>>>>> ram,
lld-link.exe takes 2:30 minutes and ~8GB of ram
>>>>>>>>>>>>>>>>> around 10
pdbs per folder: link.exe takes 1 minute and
>>>>>>>>>>>>>>>>> 2-3GB of
ram, lld-link.exe takes 1:30 minutes and ~6GB of ram
>>>>>>>>>>>>>>>>> faslink:
link.exe takes 40 seconds, but then 20 seconds of
>>>>>>>>>>>>>>>>> loading at
the first break point in the debugger and we lost DIA support
>>>>>>>>>>>>>>>>> for listing
symbols.
>>>>>>>>>>>>>>>>>
incremental: link.exe takes 8 seconds, but it only happens
>>>>>>>>>>>>>>>>> when very
minor changes happen.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We have an
non negligible number of symbols used on some
>>>>>>>>>>>>>>>>> runtime
systems.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Jan 20, 2018 at 11:52 AM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks for the tips, I now have something that reads the
>>>>>>>>>>>>>>>>>>> obj
file, finds .debug$T sections and global hashes it (proof of concept
>>>>>>>>>>>>>>>>>>>
kind of code). What I can't find is: how does clang itself writes the coff
>>>>>>>>>>>>>>>>>>>
files with global hashes, as that might help me understand how to create
>>>>>>>>>>>>>>>>>>> the
.debug$H section, how to update the file section count and how to
>>>>>>>>>>>>>>>>>>>
properly write this back.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
code on yaml2coff is expecting to be working on the
>>>>>>>>>>>>>>>>>>>
yaml COFFParser struct and I'm having quite a bit of a headache turning the
>>>>>>>>>>>>>>>>>>>
COFFObjectFile into a COFFParser object or compatible... Tomorrow I might
>>>>>>>>>>>>>>>>>>> try
the very non efficient path of coff2yaml and then yaml2coff with the
>>>>>>>>>>>>>>>>>>>
hashes header... but it seems way too inefficient and convoluted.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Fri, Jan 19, 2018 at 10:38 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 1:02 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 9:44 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 12:29 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
No I didn't, I used cl.exe from the visual studio
>>>>>>>>>>>>>>>>>>>>>>>
toolchain. What I'm proposing is a tool for processing .obj files in COFF
>>>>>>>>>>>>>>>>>>>>>>>
format, reading them and generating the GHASH part.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
To make our build faster we use hundreds of unity
>>>>>>>>>>>>>>>>>>>>>>>
build files (.cpp's with a lot of other .cpp's in them aka munch files)
but
>>>>>>>>>>>>>>>>>>>>>>>
still have a lot of single .cpp's as well (in total something like 3.4k
>>>>>>>>>>>>>>>>>>>>>>>
.obj files).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
ps: sorry for sending to the wrong list, I was
>>>>>>>>>>>>>>>>>>>>>>>
reading about llvm mailing lists and jumped when I saw what I thought was a
>>>>>>>>>>>>>>>>>>>>>>>
lld exclusive list.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
A tool like this would be useful, yes.  We've talked
>>>>>>>>>>>>>>>>>>>>>>
about it internally as well and agreed it would be useful, we just haven't
>>>>>>>>>>>>>>>>>>>>>>
prioritized it.  If you're interested in submitting a patch along those
>>>>>>>>>>>>>>>>>>>>>>
lines though, I think it would be a good addition.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I'm not sure what the best place for it would be.
>>>>>>>>>>>>>>>>>>>>>>
llvm-readobj and llvm-objdump seem like obvious choices, but they are
>>>>>>>>>>>>>>>>>>>>>>
intended to be read-only, so perhaps they wouldn't be a good fit.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is kind of a hodgepodge of everything
>>>>>>>>>>>>>>>>>>>>>>
else related to PDBs and symbols, so I wouldn't be opposed to making a new
>>>>>>>>>>>>>>>>>>>>>>
subcommand there called "ghash" or something that could process an
object
>>>>>>>>>>>>>>>>>>>>>>
file and output a new object file with a .debug$H section.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
A third option would be to make a new tool for it.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I don't htink it would be that hard to write.  If
>>>>>>>>>>>>>>>>>>>>>>
you're interested in trying to make a patch for this, I can offer some
>>>>>>>>>>>>>>>>>>>>>>
guidance on where to look in the code.  Otherwise it's something that
we'll
>>>>>>>>>>>>>>>>>>>>>>
probably get to, I'm just not sure when.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I would love to write it and contribute it back,
>>>>>>>>>>>>>>>>>>>>>
please do tell, I did find some of the code of ghash in lld, but in fuzzy
>>>>>>>>>>>>>>>>>>>>>
on the llvm codeview part of it and never seen llvm-readobj/objdump or
>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil, but I'm not afraid to look :)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Luckily all of the important code is hidden behind
>>>>>>>>>>>>>>>>>>>>
library calls, and it should already just do the right thing, so I suspect
>>>>>>>>>>>>>>>>>>>>
you won't need to know much about CodeView to do this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I think Peter has the right idea about putting this in
>>>>>>>>>>>>>>>>>>>>
llvm-objcopy.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
You can look at one of the existing CopyBinary
>>>>>>>>>>>>>>>>>>>>
functions there, which currently only work for ELF, but you can just make a
>>>>>>>>>>>>>>>>>>>>
new overload that accepts a COFFObjectFile.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I would probably start by iterating over each of the
>>>>>>>>>>>>>>>>>>>>
sections (getNumberOfSections / getSectionName) looking for .debug$T and
>>>>>>>>>>>>>>>>>>>>
.debug$H sections.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
If you find a .debug$H section then you can just skip
>>>>>>>>>>>>>>>>>>>>
that object file.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
If you find a .debug$T but not a .debug$H, then
>>>>>>>>>>>>>>>>>>>>
basically do the same thing that LLD does in PDBLinker::mergeDebugT
>>>>>>>>>>>>>>>>>>>>
(create a CVTypeArray, and pass it to GloballyHashedType::hashTypes.
>>>>>>>>>>>>>>>>>>>>
That will return an array of hash values.  (the format of .debug$H is the
>>>>>>>>>>>>>>>>>>>>
header, followed by the hash values).  Then when you're writing the list of
>>>>>>>>>>>>>>>>>>>>
sections, just add in the .debug$H section right after the .debug$T section.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Currently llvm-objcopy only writes ELF files, so it
>>>>>>>>>>>>>>>>>>>>
would need to be taught to write COFF files.  We have code to do this in
>>>>>>>>>>>>>>>>>>>>
the yaml2obj utility (specifically, in yaml2coff.cpp in the function
>>>>>>>>>>>>>>>>>>>>
writeCOFF).  There may be a way to move this code to somewhere else
>>>>>>>>>>>>>>>>>>>>
(llvm/Object/COFF.h?) so that it can be re-used by both yaml2coff and
>>>>>>>>>>>>>>>>>>>>
llvm-objcopy, but in the worst case scenario you could copy the code and
>>>>>>>>>>>>>>>>>>>>
re-write it to work with these new structures.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Lastly, you'll probably want to put all of this behind
>>>>>>>>>>>>>>>>>>>>
an option in llvm-objcopy such as -add-codeview-ghash-section
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Leonardo Santagada
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Leonardo
Santagada
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Leonardo Santagada
>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Leonardo Santagada
>>>>
>>>
>>
>>
>> --
>>
>> Leonardo Santagada
>>
>

-- 

Leonardo Santagada
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180125/7cd63036/attachment-0001.html>

Zachary Turner via llvm-dev

2018-Jan-25 17:50 UTC

head link

[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Yea as long as you compare clang-cl object file with automatically
generated .debug$H section against clang-cl object file without .debug$H
but added after the fact with llvm-objcopy, that should expose the problem
I think when you run obj2yaml on them.

On Thu, Jan 25, 2018 at 9:49 AM Leonardo Santagada <santagada at
gmail.com>
wrote:
> I did reorder my sections, so that .debug$H is in the correct place, but
> now I get some errors on dubplicate symbols, I created a folder with
> examples:
>
> https://www.dropbox.com/sh/nmvzi44pi0boe76/AAA0f47O5PCJ9JiUc6wVuwBra?dl=0
>
> t.obj is generated by vs 2015 and it links fine with lld-link.exe, but
> tout.obj gives this errors:
>
> lld-link.exe /DEBUG:GHASH tout.obj
> LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
> tout.obj and in LIBCMT.lib(default_local_stdio_options.obj)
> LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
> tout.obj and in libvcruntime.lib(undname.obj)
>
> I'm using PEView from http://wjradburn.com/software/ to look at the
files
> and can't see anything wrong, except some valid differences in the
offsets
> being used for the data (so pointer to data is different between them).
>
> I will look into yaml2obj now to see if I see anything else weird going on.
>
>
> On Thu, Jan 25, 2018 at 6:41 PM, Zachary Turner <zturner at
google.com>
> wrote:
>
>> I'm pretty confident that cl is not putting anything strange in the
>> .debug$T sections.  We've done a lot of testing and never seen
anything
>> except CodeView type records in a .debug$T.  My hunch is that your
objcopy
>> patch is probably not doing the right thing in one or more of the
section
>> headers, and this is confusing the linker.
>>
>> One idea might be to build a simple object file with clang-cl but
without
>> the magic -mllvm -emit-codeview-ghash-section, then run your
llvm-objcopy
>> on it.  Then build the same object file passing -mllvm
>> -emit-codeview-ghash-section.  Then run obj2yaml on both and diff the
>> results.  They should be byte-for-byte identical.  That should give you
a
>> clue about if objcopy is doing something wrong.
>>
>> On Thu, Jan 25, 2018 at 2:21 AM Leonardo Santagada <santagada at
gmail.com>
>> wrote:
>>
>>> Don't worry, I definetly want to perfect this to generate legal
obj
>>> files, this is just to speed up testing.
>>>
>>> Now after patching all the obj files I get this errors when linking
a
>>> small part of our code base (msvc 2017 15.5.3, lld and llvm-objcopy
7.0.0):
>>> lld-link.exe : error : relocation against symbol in discarded
section:
>>> $LN8
>>> lld-link.exe : error : relocation against symbol in discarded
section:
>>> $LN43
>>> lld-link.exe : error : relocation against symbol in discarded
section:
>>> $LN37
>>>
>>> I'm starting to guess that cl.exe might be putting some random
comdat or
>>> other discardable symbols in the .debug$T and clang doesn't? I
will try to
>>> debug this and see what more I can uncover.
>>>
>>> Linking works perfectly without my llvm-objcopy pass to add
.debug$H?
>>>
>>>
>>> On Thu, Jan 25, 2018 at 1:53 AM, Zachary Turner <zturner at
google.com>
>>> wrote:
>>>
>>>> It might not influence LLD, but at the same time we don't
want to
>>>> upstream something that is producing technically illegal COFF
files.  Also
>>>> good to hear about the planned changes to your header files. 
Looking
>>>> forward to hearing about your experiences with clang-cl.
>>>>
>>>> On Wed, Jan 24, 2018 at 10:41 AM Leonardo Santagada <
>>>> santagada at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I finally got my first .obj file patched with .debug$H to
look
>>>>> somewhat right. I added the new section at the end of the
file so I don't
>>>>> have to recalculate all sections (although now I probably
could position it
>>>>> in the middle, knowing that each section is: SizeOfRawData
+
>>>>> (last.Header.NumberOfRelocations * (4+4+2)) and the $H
needs to come right
>>>>> after $T in the file). That although illegal based on the
coff specs
>>>>> doesn't seem its going to influence lld.
>>>>>
>>>>> Also we talked and we are probably going to do something
similar to a
>>>>> bunch of windows defines and a check for our own define (to
guarantee that
>>>>> no one imported windows.h before win32.h) and drop the
namespace and the
>>>>> conflicting names.
>>>>>
>>>>>
>>>>> On Tue, Jan 23, 2018 at 12:46 AM, Zachary Turner
<zturner at google.com>
>>>>> wrote:
>>>>>
>>>>>> That's very possible that a 3rd party indirect
header include is
>>>>>> involved.  One idea might be like I suggested where you
#define _WINDOWS_
>>>>>> in win32.h and guarantee that it's always included
first.  Then those other
>>>>>> headers won't be able to #include
<windows.h>.  but it will probably
>>>>>> greatly expand the amount of stuff you have to add to
win32.h, as you will
>>>>>> probably find some callers of functions that aren't
yet in your win32.h
>>>>>> that you'd have to add.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 22, 2018 at 3:28 PM Leonardo Santagada <
>>>>>> santagada at gmail.com> wrote:
>>>>>>
>>>>>>> Ok some information was lost on getting this
example to you, I'm
>>>>>>> sorry for not being clear.
>>>>>>>
>>>>>>> We have a huge code base, let's say 90% of it
doesn't include either
>>>>>>> header, 9% include win32.h and 1% includes both, I
will try to discover
>>>>>>> why, but my guess is they include both a third
party that includes
>>>>>>> windows.h and some of our libs that use win32.h.
>>>>>>>
>>>>>>> I will try to fully understand this tomorrow.
>>>>>>>
>>>>>>> I guess clang will not implement this ever so
finishing the object
>>>>>>> copier is the best solution until all code is
ported to clang.
>>>>>>>
>>>>>>> On 23 Jan 2018 00:02, "Zachary Turner"
<zturner at google.com> wrote:
>>>>>>>
>>>>>>>> You said win32.h doesn't include windows.h,
but main.cpp does.  So
>>>>>>>> what's the disadvantage of just including
it in win32.h anyway, since it's
>>>>>>>> already going to be in every translation unit? 
(Unless you didn't mean to
>>>>>>>> #include it in main.cpp)
>>>>>>>>
>>>>>>>>
>>>>>>>> I guess all I can do is warn you how bad of an
idea this is.  For
>>>>>>>> starters, I already found a bug in your code
;-)
>>>>>>>>
>>>>>>>> // stdint.h
>>>>>>>> typedef int                int32_t;
>>>>>>>>
>>>>>>>> // winnt.h
>>>>>>>> typedef long LONG;
>>>>>>>>
>>>>>>>> // windef.h
>>>>>>>> typedef struct tagPOINT
>>>>>>>> {
>>>>>>>>     LONG  x;   // long x
>>>>>>>>     LONG  y;   // long y
>>>>>>>> } POINT, *PPOINT, NEAR *NPPOINT, FAR *LPPOINT;
>>>>>>>>
>>>>>>>> // win32.h
>>>>>>>> typedef int32_t LONG;
>>>>>>>>
>>>>>>>> struct POINT
>>>>>>>> {
>>>>>>>> LONG x;   // int x
>>>>>>>> LONG y;   // int y
>>>>>>>> };
>>>>>>>>
>>>>>>>> So POINT is defined two different ways.  In
your minimal interface,
>>>>>>>> it's declared as 2 int32's, which are
int.  In the actual Windows header
>>>>>>>> files, it's declared as 2 longs.
>>>>>>>>
>>>>>>>> This might seem like a unimportant bug since
int and long are the
>>>>>>>> same size, but int and long also mangle
differently and affect overload
>>>>>>>> resolution, so you could have weird linker
errors or call the wrong
>>>>>>>> function overload.
>>>>>>>>
>>>>>>>> Plus, it illustrates the fact that this struct
*actually is* a
>>>>>>>> different type from the one in the windows
header.
>>>>>>>>
>>>>>>>> You said at the end that you never
intentionally import win32.h and
>>>>>>>> windows.h from the same translation unit.  But
then in this example you
>>>>>>>> did.  I wonder if you could enforce that by
doing this:
>>>>>>>>
>>>>>>>> // win32.h
>>>>>>>> #pragma once
>>>>>>>>
>>>>>>>> // Error if windows.h was included before us.
>>>>>>>> #if defined(_WINDOWS_)
>>>>>>>> #error "You're including win32.h after
having already included
>>>>>>>> windows.h.  Don't do this!"
>>>>>>>> #endif
>>>>>>>>
>>>>>>>> // And also make sure windows.h can't get
included after us
>>>>>>>> #define _WINDOWS_
>>>>>>>>
>>>>>>>> For the record, I tried the test case you
linked when windows.h is
>>>>>>>> not included in main.cpp and it works (but
still has the bug about int and
>>>>>>>> long).
>>>>>>>>
>>>>>>>> On Mon, Jan 22, 2018 at 2:23 PM Leonardo
Santagada <
>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> It is super gross, but we copy parts of
windows.h because having
>>>>>>>>> all of it if both gigantic and very very
messy. So our win32.h has a couple
>>>>>>>>> thousands of lines and not 30k+ for
windows.h and we try to have zero
>>>>>>>>> macros. Win32.h doesn't include
windows.h so using ::BOOL wouldn't work. We
>>>>>>>>> don't want to create a namespace, we
just want a cleaner interface to
>>>>>>>>> windows api. The namespace with c linkage
is the way to trick cl into
>>>>>>>>> allowing us to in some files have both
windows.h and Win32.h. I really
>>>>>>>>> don't see any way for us to have this
Win32.h without this cl support, so
>>>>>>>>> maybe we should either put windows.h in a
compiled header somewhere and not
>>>>>>>>> care that it is infecting everything or
just have one place we can call to
>>>>>>>>> clean up after including windows.h (a
massive set of undefs).
>>>>>>>>>
>>>>>>>>> So using can't work, because we never
intentionally import
>>>>>>>>> windows.h and win32.h on the same
translation unit.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 22, 2018 at 7:08 PM, Zachary
Turner <
>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>
>>>>>>>>>> This is pretty gross, honestly :)
>>>>>>>>>>
>>>>>>>>>> Can't you just use using
declarations?
>>>>>>>>>>
>>>>>>>>>> namespace Win32 {
>>>>>>>>>> extern "C" {
>>>>>>>>>>
>>>>>>>>>> using ::BOOL;
>>>>>>>>>> using ::LONG;
>>>>>>>>>> using ::POINT;
>>>>>>>>>> using ::LPPOINT;
>>>>>>>>>>
>>>>>>>>>> using ::GetCursorPos;
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> This works with clang-cl.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 22, 2018 at 5:39 AM
Leonardo Santagada <
>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Here it is a minimal example, we do
this so we don't have to
>>>>>>>>>>> import the whole windows api
everywhere.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
https://gist.github.com/santagada/7977e929d31c629c4bf18ebb987f6be3
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jan 21, 2018 at 2:31 AM,
Zachary Turner <
>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Clang-cl maintains
compatibility with msvc even in cases where
>>>>>>>>>>>> it’s non standards compliant
(eg 2 phase name lookup), but we try to keep
>>>>>>>>>>>> these cases few and far
between.
>>>>>>>>>>>>
>>>>>>>>>>>> To help me understand your
case, do you mean you copy windows.h
>>>>>>>>>>>> and modify it? How does this
lead to the same struct being defined twice?
>>>>>>>>>>>> If i were to write this:
>>>>>>>>>>>>
>>>>>>>>>>>> struct Foo {};
>>>>>>>>>>>> struct Foo {};
>>>>>>>>>>>>
>>>>>>>>>>>> Is this a small repro of the
issue you’re talking about?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jan 20, 2018 at 3:44 PM
Leonardo Santagada <
>>>>>>>>>>>> santagada at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I can totally see something
like incremental linking with a
>>>>>>>>>>>>> simple padding between obj
and a mapping file (which can also help with
>>>>>>>>>>>>> edit and continue,
something we also would love to have).
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have another developer
doing the port to support clang-cl,
>>>>>>>>>>>>> but although most of our
code also goes trough a version of clang,
>>>>>>>>>>>>> migrating the rest to
clang-cl has been a fight. From what I heard the main
>>>>>>>>>>>>> problem is that we have a
copy of parts of windows.h (so not to bring the
>>>>>>>>>>>>> awful parts of it like
lower case macros) and that totally works on cl, but
>>>>>>>>>>>>> clang (at least 6.0)
complains about two struct/vars with the same name,
>>>>>>>>>>>>> even though they are
exactly the same. Making clang-cl as broken as cl.exe
>>>>>>>>>>>>> is not an option I suppose?
I would love to turn on a flag
>>>>>>>>>>>>>
--accept-that-cl-made-bad-decisions-and-live-with-it and have this at least
>>>>>>>>>>>>> until this is completely
fixed in our code base.
>>>>>>>>>>>>>
>>>>>>>>>>>>> the biggest win with moving
to cl would be a better more
>>>>>>>>>>>>> standards compliant
compiler, no 1 minute compiles on heavily templated
>>>>>>>>>>>>> files and maybe the holy
grail of ThinLTO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Jan 20, 2018 at
10:56 PM, Zachary Turner <
>>>>>>>>>>>>> zturner at google.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10-15s will be hard
without true incremental linking.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At some point
that's going to be the only way to get any
>>>>>>>>>>>>>> faster, but incremental
linking is hard (putting it lightly), and since our
>>>>>>>>>>>>>> full links are already
really fast we think we can get reasonably close to
>>>>>>>>>>>>>> link.exe incremental
speeds with full links.  But it's never enough and I
>>>>>>>>>>>>>> will always want it to
be faster, so you may see incremental linking in the
>>>>>>>>>>>>>> future after we hit a
performance wall with full link speed :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In any case, I'm
definitely interested in seeing what kind of
>>>>>>>>>>>>>> numbers you get with
/debug:ghash after you get this llvm-objcopy feature
>>>>>>>>>>>>>> implemented.  So keep
me updated :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As an aside, have you
tried building with clang instead of
>>>>>>>>>>>>>> cl?  If you build with
clang you wouldn't even have to do this llvm-objcopy
>>>>>>>>>>>>>> work, because it would
"just work".  If you've tried but ran into issues
>>>>>>>>>>>>>> I'm interested in
hearing about those too.  On the other hand, it's also
>>>>>>>>>>>>>> reasonable to only
switch one thing at a time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at
1:34 PM Leonardo Santagada <
>>>>>>>>>>>>>> santagada at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if we get to <
30s I think most users would prefer it to
>>>>>>>>>>>>>>> link.exe, just
hopping there is still some more optimizations to get closer
>>>>>>>>>>>>>>> to ELF linking
times (around 10-15s here).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Jan 20,
2018 at 9:50 PM, Zachary Turner <
>>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Generally
speaking a good rule of thumb is that
>>>>>>>>>>>>>>>> /debug:ghash
will be close to or faster than /debug:fastlink, but with none
>>>>>>>>>>>>>>>> of the
penalties like slow debug time
>>>>>>>>>>>>>>>> On Sat, Jan 20,
2018 at 12:44 PM Zachary Turner <
>>>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Chrome is
actually one of my exact benchmark cases. When
>>>>>>>>>>>>>>>>> building
blink_core.dll and browser_tests.exe, i get anywhere from a 20-40%
>>>>>>>>>>>>>>>>> reduction
in link time. We have some other optimizations in the pipeline
>>>>>>>>>>>>>>>>> but not
upstream yet.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My best
time so far (including other optimizations not yet
>>>>>>>>>>>>>>>>> upstream)
is 28s on blink_core.dll, compared to 110s with /debug
>>>>>>>>>>>>>>>>> On Sat, Jan
20, 2018 at 12:28 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>> santagada
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat,
Jan 20, 2018 at 9:05 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>> zturner
at google.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You
probably don't want to go down the same route that
>>>>>>>>>>>>>>>>>>>
clang goes through to write the object file.  If you think yaml2coff is
>>>>>>>>>>>>>>>>>>>
convoluted, the way clang does it will just give you a headache.  There are
>>>>>>>>>>>>>>>>>>>
multiple abstractions involved to account for different object file formats
>>>>>>>>>>>>>>>>>>>
(ELF, COFF, MachO) and output formats (Assembly, binary file).  At least
>>>>>>>>>>>>>>>>>>>
with yaml2coff
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think
your phrase got cut there, but yeah I just found
>>>>>>>>>>>>>>>>>>
AsmPrinter.cpp and it is convoluted.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
It's true that yaml2coff is using the COFFParser
>>>>>>>>>>>>>>>>>>>
structure, but if you look at the writeCOFF function in
>>>>>>>>>>>>>>>>>>>
yaml2coff it's pretty bare-metal.  The logic you need will be almost
>>>>>>>>>>>>>>>>>>>
identical, except that instead of checking the COFFParser for the various
>>>>>>>>>>>>>>>>>>>
fields, you'll check the existing COFFObjectFile, which should have similar
>>>>>>>>>>>>>>>>>>>
fields.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
only thing you need to different is when writing the
>>>>>>>>>>>>>>>>>>>
section table and section contents, to insert a new entry.  Since
>>>>>>>>>>>>>>>>>>>
you're injecting a section into the middle, you'll also probably need to
>>>>>>>>>>>>>>>>>>>
push back the file pointer of all subsequent sections so that they don't
>>>>>>>>>>>>>>>>>>>
overlap.  (e.g. if the original sections are 1, 2, 3, 4, 5 and you insert
>>>>>>>>>>>>>>>>>>>
between 2 and 3, then the original sections 3, 4, and 5 would need to have
>>>>>>>>>>>>>>>>>>>
their FilePointerToRawData offset by the size of the new section).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have
the PE/COFF spec open here and I'm happy that I
>>>>>>>>>>>>>>>>>> read a
bit of it so I actually know what you are talking about... yeah it
>>>>>>>>>>>>>>>>>>
doesn't seem too complicated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If
you need to know what values to put for the other
>>>>>>>>>>>>>>>>>>>
fields in a section header, run `dumpbin /headers foo.obj` on a
>>>>>>>>>>>>>>>>>>>
clang-generated object file that has a .debug$H section already (e.g. run
>>>>>>>>>>>>>>>>>>>
clang with -emit-codeview-ghash-section, and look at the properties of the
>>>>>>>>>>>>>>>>>>>
.debug$H section and use the same values).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
I will do that and then also look at how the
>>>>>>>>>>>>>>>>>>
CodeView part of the code does it if I can't understand some of it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The
only invariant that needs to be maintained is that
>>>>>>>>>>>>>>>>>>>
Section[N]->FilePointerOfRawData == Section[N-1]->FilePointerOfRawData +
>>>>>>>>>>>>>>>>>>>
Section[N-1]->SizeOfRawData
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well,
that and all the sections need to be on the final
>>>>>>>>>>>>>>>>>> file...
But I'm hopeful.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Anyone
has times on linking a big project like chrome
>>>>>>>>>>>>>>>>>> with
this so that at least I know what kind of performance to expect?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My
numbers are something like:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1 pdb
per obj file: link.exe takes ~15 minutes and 16GB
>>>>>>>>>>>>>>>>>> of ram,
lld-link.exe takes 2:30 minutes and ~8GB of ram
>>>>>>>>>>>>>>>>>> around
10 pdbs per folder: link.exe takes 1 minute and
>>>>>>>>>>>>>>>>>> 2-3GB
of ram, lld-link.exe takes 1:30 minutes and ~6GB of ram
>>>>>>>>>>>>>>>>>>
faslink: link.exe takes 40 seconds, but then 20 seconds
>>>>>>>>>>>>>>>>>> of
loading at the first break point in the debugger and we lost DIA support
>>>>>>>>>>>>>>>>>> for
listing symbols.
>>>>>>>>>>>>>>>>>>
incremental: link.exe takes 8 seconds, but it only
>>>>>>>>>>>>>>>>>> happens
when very minor changes happen.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have
an non negligible number of symbols used on some
>>>>>>>>>>>>>>>>>> runtime
systems.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Jan 20, 2018 at 11:52 AM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thanks for the tips, I now have something that reads
>>>>>>>>>>>>>>>>>>>>
the obj file, finds .debug$T sections and global hashes it (proof of
>>>>>>>>>>>>>>>>>>>>
concept kind of code). What I can't find is: how does clang itself writes
>>>>>>>>>>>>>>>>>>>>
the coff files with global hashes, as that might help me understand how to
>>>>>>>>>>>>>>>>>>>>
create the .debug$H section, how to update the file section count and how
>>>>>>>>>>>>>>>>>>>>
to properly write this back.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The code on yaml2coff is expecting to be working on the
>>>>>>>>>>>>>>>>>>>>
yaml COFFParser struct and I'm having quite a bit of a headache turning the
>>>>>>>>>>>>>>>>>>>>
COFFObjectFile into a COFFParser object or compatible... Tomorrow I might
>>>>>>>>>>>>>>>>>>>>
try the very non efficient path of coff2yaml and then yaml2coff with the
>>>>>>>>>>>>>>>>>>>>
hashes header... but it seems way too inefficient and convoluted.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 10:38 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 1:02 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 9:44 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 12:29 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Hi,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
No I didn't, I used cl.exe from the visual studio
>>>>>>>>>>>>>>>>>>>>>>>>
toolchain. What I'm proposing is a tool for processing .obj files in COFF
>>>>>>>>>>>>>>>>>>>>>>>>
format, reading them and generating the GHASH part.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
To make our build faster we use hundreds of unity
>>>>>>>>>>>>>>>>>>>>>>>>
build files (.cpp's with a lot of other .cpp's in them aka munch files)
but
>>>>>>>>>>>>>>>>>>>>>>>>
still have a lot of single .cpp's as well (in total something like 3.4k
>>>>>>>>>>>>>>>>>>>>>>>>
.obj files).
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
ps: sorry for sending to the wrong list, I was
>>>>>>>>>>>>>>>>>>>>>>>>
reading about llvm mailing lists and jumped when I saw what I thought was a
>>>>>>>>>>>>>>>>>>>>>>>>
lld exclusive list.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
A tool like this would be useful, yes.  We've talked
>>>>>>>>>>>>>>>>>>>>>>>
about it internally as well and agreed it would be useful, we just haven't
>>>>>>>>>>>>>>>>>>>>>>>
prioritized it.  If you're interested in submitting a patch along those
>>>>>>>>>>>>>>>>>>>>>>>
lines though, I think it would be a good addition.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I'm not sure what the best place for it would be.
>>>>>>>>>>>>>>>>>>>>>>>
llvm-readobj and llvm-objdump seem like obvious choices, but they are
>>>>>>>>>>>>>>>>>>>>>>>
intended to be read-only, so perhaps they wouldn't be a good fit.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is kind of a hodgepodge of everything
>>>>>>>>>>>>>>>>>>>>>>>
else related to PDBs and symbols, so I wouldn't be opposed to making a new
>>>>>>>>>>>>>>>>>>>>>>>
subcommand there called "ghash" or something that could process an
object
>>>>>>>>>>>>>>>>>>>>>>>
file and output a new object file with a .debug$H section.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
A third option would be to make a new tool for it.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I don't htink it would be that hard to write.  If
>>>>>>>>>>>>>>>>>>>>>>>
you're interested in trying to make a patch for this, I can offer some
>>>>>>>>>>>>>>>>>>>>>>>
guidance on where to look in the code.  Otherwise it's something that
we'll
>>>>>>>>>>>>>>>>>>>>>>>
probably get to, I'm just not sure when.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I would love to write it and contribute it back,
>>>>>>>>>>>>>>>>>>>>>>
please do tell, I did find some of the code of ghash in lld, but in fuzzy
>>>>>>>>>>>>>>>>>>>>>>
on the llvm codeview part of it and never seen llvm-readobj/objdump or
>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil, but I'm not afraid to look :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Luckily all of the important code is hidden behind
>>>>>>>>>>>>>>>>>>>>>
library calls, and it should already just do the right thing, so I suspect
>>>>>>>>>>>>>>>>>>>>>
you won't need to know much about CodeView to do this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I think Peter has the right idea about putting this in
>>>>>>>>>>>>>>>>>>>>>
llvm-objcopy.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
You can look at one of the existing CopyBinary
>>>>>>>>>>>>>>>>>>>>>
functions there, which currently only work for ELF, but you can just make a
>>>>>>>>>>>>>>>>>>>>>
new overload that accepts a COFFObjectFile.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I would probably start by iterating over each of the
>>>>>>>>>>>>>>>>>>>>>
sections (getNumberOfSections / getSectionName) looking for .debug$T and
>>>>>>>>>>>>>>>>>>>>>
.debug$H sections.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
If you find a .debug$H section then you can just skip
>>>>>>>>>>>>>>>>>>>>>
that object file.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
If you find a .debug$T but not a .debug$H, then
>>>>>>>>>>>>>>>>>>>>>
basically do the same thing that LLD does in PDBLinker::mergeDebugT
>>>>>>>>>>>>>>>>>>>>>
(create a CVTypeArray, and pass it to GloballyHashedType::hashTypes.  That
>>>>>>>>>>>>>>>>>>>>>
will return an array of hash values.  (the format of .debug$H is the
>>>>>>>>>>>>>>>>>>>>>
header, followed by the hash values).  Then when you're writing the list of
>>>>>>>>>>>>>>>>>>>>>
sections, just add in the .debug$H section right after the .debug$T section.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Currently llvm-objcopy only writes ELF files, so it
>>>>>>>>>>>>>>>>>>>>>
would need to be taught to write COFF files.  We have code to do this in
>>>>>>>>>>>>>>>>>>>>>
the yaml2obj utility (specifically, in yaml2coff.cpp in the function
>>>>>>>>>>>>>>>>>>>>>
writeCOFF).  There may be a way to move this code to somewhere else
>>>>>>>>>>>>>>>>>>>>>
(llvm/Object/COFF.h?) so that it can be re-used by both yaml2coff and
>>>>>>>>>>>>>>>>>>>>>
llvm-objcopy, but in the worst case scenario you could copy the code and
>>>>>>>>>>>>>>>>>>>>>
re-write it to work with these new structures.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Lastly, you'll probably want to put all of this behind
>>>>>>>>>>>>>>>>>>>>>
an option in llvm-objcopy such as -add-codeview-ghash-section
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Leonardo Santagada
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
Leonardo Santagada
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Leonardo Santagada
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Leonardo Santagada
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Leonardo Santagada
>>>
>>
>
>
> --
>
> Leonardo Santagada
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180125/d6fe3aaf/attachment.html>

Zachary Turner via llvm-dev

2018-Jan-25 17:52 UTC

head link

[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Actually I already have a theory that even though you are adding the
section to the section table, you might not be adding a *symbol* for the
section to the symbol table.  So the existing symbols (which reference
sections by index) will all be wrong because you've inserted a new
section.  Still though, obj2yaml would expose that.

On Thu, Jan 25, 2018 at 9:50 AM Zachary Turner <zturner at google.com>
wrote:
> Yea as long as you compare clang-cl object file with automatically
> generated .debug$H section against clang-cl object file without .debug$H
> but added after the fact with llvm-objcopy, that should expose the problem
> I think when you run obj2yaml on them.
>
> On Thu, Jan 25, 2018 at 9:49 AM Leonardo Santagada <santagada at
gmail.com>
> wrote:
>
>> I did reorder my sections, so that .debug$H is in the correct place,
but
>> now I get some errors on dubplicate symbols, I created a folder with
>> examples:
>>
>>
https://www.dropbox.com/sh/nmvzi44pi0boe76/AAA0f47O5PCJ9JiUc6wVuwBra?dl=0
>>
>> t.obj is generated by vs 2015 and it links fine with lld-link.exe, but
>> tout.obj gives this errors:
>>
>> lld-link.exe /DEBUG:GHASH tout.obj
>> LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
>> tout.obj and in LIBCMT.lib(default_local_stdio_options.obj)
>> LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
>> tout.obj and in libvcruntime.lib(undname.obj)
>>
>> I'm using PEView from http://wjradburn.com/software/ to look at the
>> files and can't see anything wrong, except some valid differences
in the
>> offsets being used for the data (so pointer to data is different
between
>> them).
>>
>> I will look into yaml2obj now to see if I see anything else weird going
>> on.
>>
>>
>> On Thu, Jan 25, 2018 at 6:41 PM, Zachary Turner <zturner at
google.com>
>> wrote:
>>
>>> I'm pretty confident that cl is not putting anything strange in
the
>>> .debug$T sections.  We've done a lot of testing and never seen
anything
>>> except CodeView type records in a .debug$T.  My hunch is that your
objcopy
>>> patch is probably not doing the right thing in one or more of the
section
>>> headers, and this is confusing the linker.
>>>
>>> One idea might be to build a simple object file with clang-cl but
>>> without the magic -mllvm -emit-codeview-ghash-section, then run
your
>>> llvm-objcopy on it.  Then build the same object file passing -mllvm
>>> -emit-codeview-ghash-section.  Then run obj2yaml on both and diff
the
>>> results.  They should be byte-for-byte identical.  That should give
you a
>>> clue about if objcopy is doing something wrong.
>>>
>>> On Thu, Jan 25, 2018 at 2:21 AM Leonardo Santagada <santagada at
gmail.com>
>>> wrote:
>>>
>>>> Don't worry, I definetly want to perfect this to generate
legal obj
>>>> files, this is just to speed up testing.
>>>>
>>>> Now after patching all the obj files I get this errors when
linking a
>>>> small part of our code base (msvc 2017 15.5.3, lld and
llvm-objcopy 7.0.0):
>>>> lld-link.exe : error : relocation against symbol in discarded
section:
>>>> $LN8
>>>> lld-link.exe : error : relocation against symbol in discarded
section:
>>>> $LN43
>>>> lld-link.exe : error : relocation against symbol in discarded
section:
>>>> $LN37
>>>>
>>>> I'm starting to guess that cl.exe might be putting some
random comdat
>>>> or other discardable symbols in the .debug$T and clang
doesn't? I will try
>>>> to debug this and see what more I can uncover.
>>>>
>>>> Linking works perfectly without my llvm-objcopy pass to add
.debug$H?
>>>>
>>>>
>>>> On Thu, Jan 25, 2018 at 1:53 AM, Zachary Turner <zturner at
google.com>
>>>> wrote:
>>>>
>>>>> It might not influence LLD, but at the same time we
don't want to
>>>>> upstream something that is producing technically illegal
COFF files.  Also
>>>>> good to hear about the planned changes to your header
files.  Looking
>>>>> forward to hearing about your experiences with clang-cl.
>>>>>
>>>>> On Wed, Jan 24, 2018 at 10:41 AM Leonardo Santagada <
>>>>> santagada at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I finally got my first .obj file patched with .debug$H
to look
>>>>>> somewhat right. I added the new section at the end of
the file so I don't
>>>>>> have to recalculate all sections (although now I
probably could position it
>>>>>> in the middle, knowing that each section is:
SizeOfRawData +
>>>>>> (last.Header.NumberOfRelocations * (4+4+2)) and the $H
needs to come right
>>>>>> after $T in the file). That although illegal based on
the coff specs
>>>>>> doesn't seem its going to influence lld.
>>>>>>
>>>>>> Also we talked and we are probably going to do
something similar to a
>>>>>> bunch of windows defines and a check for our own define
(to guarantee that
>>>>>> no one imported windows.h before win32.h) and drop the
namespace and the
>>>>>> conflicting names.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 23, 2018 at 12:46 AM, Zachary Turner
<zturner at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That's very possible that a 3rd party indirect
header include is
>>>>>>> involved.  One idea might be like I suggested where
you #define _WINDOWS_
>>>>>>> in win32.h and guarantee that it's always
included first.  Then those other
>>>>>>> headers won't be able to #include
<windows.h>.  but it will probably
>>>>>>> greatly expand the amount of stuff you have to add
to win32.h, as you will
>>>>>>> probably find some callers of functions that
aren't yet in your win32.h
>>>>>>> that you'd have to add.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 22, 2018 at 3:28 PM Leonardo Santagada
<
>>>>>>> santagada at gmail.com> wrote:
>>>>>>>
>>>>>>>> Ok some information was lost on getting this
example to you, I'm
>>>>>>>> sorry for not being clear.
>>>>>>>>
>>>>>>>> We have a huge code base, let's say 90% of
it doesn't include
>>>>>>>> either header, 9% include win32.h and 1%
includes both, I will try to
>>>>>>>> discover why, but my guess is they include both
a third party that includes
>>>>>>>> windows.h and some of our libs that use
win32.h.
>>>>>>>>
>>>>>>>> I will try to fully understand this tomorrow.
>>>>>>>>
>>>>>>>> I guess clang will not implement this ever so
finishing the object
>>>>>>>> copier is the best solution until all code is
ported to clang.
>>>>>>>>
>>>>>>>> On 23 Jan 2018 00:02, "Zachary
Turner" <zturner at google.com> wrote:
>>>>>>>>
>>>>>>>>> You said win32.h doesn't include
windows.h, but main.cpp does.  So
>>>>>>>>> what's the disadvantage of just
including it in win32.h anyway, since it's
>>>>>>>>> already going to be in every translation
unit?  (Unless you didn't mean to
>>>>>>>>> #include it in main.cpp)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I guess all I can do is warn you how bad of
an idea this is.  For
>>>>>>>>> starters, I already found a bug in your
code ;-)
>>>>>>>>>
>>>>>>>>> // stdint.h
>>>>>>>>> typedef int                int32_t;
>>>>>>>>>
>>>>>>>>> // winnt.h
>>>>>>>>> typedef long LONG;
>>>>>>>>>
>>>>>>>>> // windef.h
>>>>>>>>> typedef struct tagPOINT
>>>>>>>>> {
>>>>>>>>>     LONG  x;   // long x
>>>>>>>>>     LONG  y;   // long y
>>>>>>>>> } POINT, *PPOINT, NEAR *NPPOINT, FAR
*LPPOINT;
>>>>>>>>>
>>>>>>>>> // win32.h
>>>>>>>>> typedef int32_t LONG;
>>>>>>>>>
>>>>>>>>> struct POINT
>>>>>>>>> {
>>>>>>>>> LONG x;   // int x
>>>>>>>>> LONG y;   // int y
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> So POINT is defined two different ways.  In
your minimal
>>>>>>>>> interface, it's declared as 2
int32's, which are int.  In the actual
>>>>>>>>> Windows header files, it's declared as
2 longs.
>>>>>>>>>
>>>>>>>>> This might seem like a unimportant bug
since int and long are the
>>>>>>>>> same size, but int and long also mangle
differently and affect overload
>>>>>>>>> resolution, so you could have weird linker
errors or call the wrong
>>>>>>>>> function overload.
>>>>>>>>>
>>>>>>>>> Plus, it illustrates the fact that this
struct *actually is* a
>>>>>>>>> different type from the one in the windows
header.
>>>>>>>>>
>>>>>>>>> You said at the end that you never
intentionally import win32.h
>>>>>>>>> and windows.h from the same translation
unit.  But then in this example you
>>>>>>>>> did.  I wonder if you could enforce that by
doing this:
>>>>>>>>>
>>>>>>>>> // win32.h
>>>>>>>>> #pragma once
>>>>>>>>>
>>>>>>>>> // Error if windows.h was included before
us.
>>>>>>>>> #if defined(_WINDOWS_)
>>>>>>>>> #error "You're including win32.h
after having already included
>>>>>>>>> windows.h.  Don't do this!"
>>>>>>>>> #endif
>>>>>>>>>
>>>>>>>>> // And also make sure windows.h can't
get included after us
>>>>>>>>> #define _WINDOWS_
>>>>>>>>>
>>>>>>>>> For the record, I tried the test case you
linked when windows.h is
>>>>>>>>> not included in main.cpp and it works (but
still has the bug about int and
>>>>>>>>> long).
>>>>>>>>>
>>>>>>>>> On Mon, Jan 22, 2018 at 2:23 PM Leonardo
Santagada <
>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> It is super gross, but we copy parts of
windows.h because having
>>>>>>>>>> all of it if both gigantic and very
very messy. So our win32.h has a couple
>>>>>>>>>> thousands of lines and not 30k+ for
windows.h and we try to have zero
>>>>>>>>>> macros. Win32.h doesn't include
windows.h so using ::BOOL wouldn't work. We
>>>>>>>>>> don't want to create a namespace,
we just want a cleaner interface to
>>>>>>>>>> windows api. The namespace with c
linkage is the way to trick cl into
>>>>>>>>>> allowing us to in some files have both
windows.h and Win32.h. I really
>>>>>>>>>> don't see any way for us to have
this Win32.h without this cl support, so
>>>>>>>>>> maybe we should either put windows.h in
a compiled header somewhere and not
>>>>>>>>>> care that it is infecting everything or
just have one place we can call to
>>>>>>>>>> clean up after including windows.h (a
massive set of undefs).
>>>>>>>>>>
>>>>>>>>>> So using can't work, because we
never intentionally import
>>>>>>>>>> windows.h and win32.h on the same
translation unit.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 22, 2018 at 7:08 PM,
Zachary Turner <
>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is pretty gross, honestly :)
>>>>>>>>>>>
>>>>>>>>>>> Can't you just use using
declarations?
>>>>>>>>>>>
>>>>>>>>>>> namespace Win32 {
>>>>>>>>>>> extern "C" {
>>>>>>>>>>>
>>>>>>>>>>> using ::BOOL;
>>>>>>>>>>> using ::LONG;
>>>>>>>>>>> using ::POINT;
>>>>>>>>>>> using ::LPPOINT;
>>>>>>>>>>>
>>>>>>>>>>> using ::GetCursorPos;
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> This works with clang-cl.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 22, 2018 at 5:39 AM
Leonardo Santagada <
>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Here it is a minimal example,
we do this so we don't have to
>>>>>>>>>>>> import the whole windows api
everywhere.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
https://gist.github.com/santagada/7977e929d31c629c4bf18ebb987f6be3
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jan 21, 2018 at 2:31
AM, Zachary Turner <
>>>>>>>>>>>> zturner at google.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Clang-cl maintains
compatibility with msvc even in cases where
>>>>>>>>>>>>> it’s non standards
compliant (eg 2 phase name lookup), but we try to keep
>>>>>>>>>>>>> these cases few and far
between.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To help me understand your
case, do you mean you copy
>>>>>>>>>>>>> windows.h and modify it?
How does this lead to the same struct being
>>>>>>>>>>>>> defined twice? If i were to
write this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> struct Foo {};
>>>>>>>>>>>>> struct Foo {};
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is this a small repro of
the issue you’re talking about?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Jan 20, 2018 at
3:44 PM Leonardo Santagada <
>>>>>>>>>>>>> santagada at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can totally see
something like incremental linking with a
>>>>>>>>>>>>>> simple padding between
obj and a mapping file (which can also help with
>>>>>>>>>>>>>> edit and continue,
something we also would love to have).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have another
developer doing the port to support clang-cl,
>>>>>>>>>>>>>> but although most of
our code also goes trough a version of clang,
>>>>>>>>>>>>>> migrating the rest to
clang-cl has been a fight. From what I heard the main
>>>>>>>>>>>>>> problem is that we have
a copy of parts of windows.h (so not to bring the
>>>>>>>>>>>>>> awful parts of it like
lower case macros) and that totally works on cl, but
>>>>>>>>>>>>>> clang (at least 6.0)
complains about two struct/vars with the same name,
>>>>>>>>>>>>>> even though they are
exactly the same. Making clang-cl as broken as cl.exe
>>>>>>>>>>>>>> is not an option I
suppose? I would love to turn on a flag
>>>>>>>>>>>>>>
--accept-that-cl-made-bad-decisions-and-live-with-it and have this at least
>>>>>>>>>>>>>> until this is
completely fixed in our code base.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the biggest win with
moving to cl would be a better more
>>>>>>>>>>>>>> standards compliant
compiler, no 1 minute compiles on heavily templated
>>>>>>>>>>>>>> files and maybe the
holy grail of ThinLTO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at
10:56 PM, Zachary Turner <
>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 10-15s will be hard
without true incremental linking.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At some point
that's going to be the only way to get any
>>>>>>>>>>>>>>> faster, but
incremental linking is hard (putting it lightly), and since our
>>>>>>>>>>>>>>> full links are
already really fast we think we can get reasonably close to
>>>>>>>>>>>>>>> link.exe
incremental speeds with full links.  But it's never enough and I
>>>>>>>>>>>>>>> will always want it
to be faster, so you may see incremental linking in the
>>>>>>>>>>>>>>> future after we hit
a performance wall with full link speed :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In any case,
I'm definitely interested in seeing what kind
>>>>>>>>>>>>>>> of numbers you get
with /debug:ghash after you get this llvm-objcopy
>>>>>>>>>>>>>>> feature
implemented.  So keep me updated :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As an aside, have
you tried building with clang instead of
>>>>>>>>>>>>>>> cl?  If you build
with clang you wouldn't even have to do this llvm-objcopy
>>>>>>>>>>>>>>> work, because it
would "just work".  If you've tried but ran into issues
>>>>>>>>>>>>>>> I'm interested
in hearing about those too.  On the other hand, it's also
>>>>>>>>>>>>>>> reasonable to only
switch one thing at a time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Jan 20,
2018 at 1:34 PM Leonardo Santagada <
>>>>>>>>>>>>>>> santagada at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> if we get to
< 30s I think most users would prefer it to
>>>>>>>>>>>>>>>> link.exe, just
hopping there is still some more optimizations to get closer
>>>>>>>>>>>>>>>> to ELF linking
times (around 10-15s here).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Jan 20,
2018 at 9:50 PM, Zachary Turner <
>>>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Generally
speaking a good rule of thumb is that
>>>>>>>>>>>>>>>>>
/debug:ghash will be close to or faster than /debug:fastlink, but with none
>>>>>>>>>>>>>>>>> of the
penalties like slow debug time
>>>>>>>>>>>>>>>>> On Sat, Jan
20, 2018 at 12:44 PM Zachary Turner <
>>>>>>>>>>>>>>>>> zturner at
google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Chrome
is actually one of my exact benchmark cases. When
>>>>>>>>>>>>>>>>>>
building blink_core.dll and browser_tests.exe, i get anywhere from a 20-40%
>>>>>>>>>>>>>>>>>>
reduction in link time. We have some other optimizations in the pipeline
>>>>>>>>>>>>>>>>>> but not
upstream yet.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My best
time so far (including other optimizations not
>>>>>>>>>>>>>>>>>> yet
upstream) is 28s on blink_core.dll, compared to 110s with /debug
>>>>>>>>>>>>>>>>>> On Sat,
Jan 20, 2018 at 12:28 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Sat, Jan 20, 2018 at 9:05 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
You probably don't want to go down the same route that
>>>>>>>>>>>>>>>>>>>>
clang goes through to write the object file.  If you think yaml2coff is
>>>>>>>>>>>>>>>>>>>>
convoluted, the way clang does it will just give you a headache.  There are
>>>>>>>>>>>>>>>>>>>>
multiple abstractions involved to account for different object file formats
>>>>>>>>>>>>>>>>>>>>
(ELF, COFF, MachO) and output formats (Assembly, binary file).  At least
>>>>>>>>>>>>>>>>>>>>
with yaml2coff
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
think your phrase got cut there, but yeah I just found
>>>>>>>>>>>>>>>>>>>
AsmPrinter.cpp and it is convoluted.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
It's true that yaml2coff is using the COFFParser
>>>>>>>>>>>>>>>>>>>>
structure, but if you look at the writeCOFF function
>>>>>>>>>>>>>>>>>>>>
in yaml2coff it's pretty bare-metal.  The logic you need will be almost
>>>>>>>>>>>>>>>>>>>>
identical, except that instead of checking the COFFParser for the various
>>>>>>>>>>>>>>>>>>>>
fields, you'll check the existing COFFObjectFile, which should have similar
>>>>>>>>>>>>>>>>>>>>
fields.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The only thing you need to different is when writing
>>>>>>>>>>>>>>>>>>>>
the section table and section contents, to insert a new entry.  Since
>>>>>>>>>>>>>>>>>>>>
you're injecting a section into the middle, you'll also probably need to
>>>>>>>>>>>>>>>>>>>>
push back the file pointer of all subsequent sections so that they don't
>>>>>>>>>>>>>>>>>>>>
overlap.  (e.g. if the original sections are 1, 2, 3, 4, 5 and you insert
>>>>>>>>>>>>>>>>>>>>
between 2 and 3, then the original sections 3, 4, and 5 would need to have
>>>>>>>>>>>>>>>>>>>>
their FilePointerToRawData offset by the size of the new section).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I
have the PE/COFF spec open here and I'm happy that I
>>>>>>>>>>>>>>>>>>>
read a bit of it so I actually know what you are talking about... yeah it
>>>>>>>>>>>>>>>>>>>
doesn't seem too complicated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
If you need to know what values to put for the other
>>>>>>>>>>>>>>>>>>>>
fields in a section header, run `dumpbin /headers foo.obj` on a
>>>>>>>>>>>>>>>>>>>>
clang-generated object file that has a .debug$H section already (e.g. run
>>>>>>>>>>>>>>>>>>>>
clang with -emit-codeview-ghash-section, and look at the properties of the
>>>>>>>>>>>>>>>>>>>>
.debug$H section and use the same values).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks I will do that and then also look at how the
>>>>>>>>>>>>>>>>>>>
CodeView part of the code does it if I can't understand some of it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
The only invariant that needs to be maintained is that
>>>>>>>>>>>>>>>>>>>>
Section[N]->FilePointerOfRawData == Section[N-1]->FilePointerOfRawData +
>>>>>>>>>>>>>>>>>>>>
Section[N-1]->SizeOfRawData
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Well, that and all the sections need to be on the final
>>>>>>>>>>>>>>>>>>>
file... But I'm hopeful.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Anyone has times on linking a big project like chrome
>>>>>>>>>>>>>>>>>>>
with this so that at least I know what kind of performance to expect?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> My
numbers are something like:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1
pdb per obj file: link.exe takes ~15 minutes and 16GB
>>>>>>>>>>>>>>>>>>> of
ram, lld-link.exe takes 2:30 minutes and ~8GB of ram
>>>>>>>>>>>>>>>>>>>
around 10 pdbs per folder: link.exe takes 1 minute and
>>>>>>>>>>>>>>>>>>>
2-3GB of ram, lld-link.exe takes 1:30 minutes and ~6GB of ram
>>>>>>>>>>>>>>>>>>>
faslink: link.exe takes 40 seconds, but then 20 seconds
>>>>>>>>>>>>>>>>>>> of
loading at the first break point in the debugger and we lost DIA support
>>>>>>>>>>>>>>>>>>> for
listing symbols.
>>>>>>>>>>>>>>>>>>>
incremental: link.exe takes 8 seconds, but it only
>>>>>>>>>>>>>>>>>>>
happens when very minor changes happen.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We
have an non negligible number of symbols used on some
>>>>>>>>>>>>>>>>>>>
runtime systems.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Sat, Jan 20, 2018 at 11:52 AM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thanks for the tips, I now have something that reads
>>>>>>>>>>>>>>>>>>>>>
the obj file, finds .debug$T sections and global hashes it (proof of
>>>>>>>>>>>>>>>>>>>>>
concept kind of code). What I can't find is: how does clang itself writes
>>>>>>>>>>>>>>>>>>>>>
the coff files with global hashes, as that might help me understand how to
>>>>>>>>>>>>>>>>>>>>>
create the .debug$H section, how to update the file section count and how
>>>>>>>>>>>>>>>>>>>>>
to properly write this back.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
The code on yaml2coff is expecting to be working on
>>>>>>>>>>>>>>>>>>>>>
the yaml COFFParser struct and I'm having quite a bit of a headache turning
>>>>>>>>>>>>>>>>>>>>>
the COFFObjectFile into a COFFParser object or compatible... Tomorrow I
>>>>>>>>>>>>>>>>>>>>>
might try the very non efficient path of coff2yaml and then yaml2coff with
>>>>>>>>>>>>>>>>>>>>>
the hashes header... but it seems way too inefficient and convoluted.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 10:38 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 1:02 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>>>
santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 9:44 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>
zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Fri, Jan 19, 2018 at 12:29 PM Leonardo Santagada
>>>>>>>>>>>>>>>>>>>>>>>>
<santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
No I didn't, I used cl.exe from the visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>
toolchain. What I'm proposing is a tool for processing .obj files in COFF
>>>>>>>>>>>>>>>>>>>>>>>>>
format, reading them and generating the GHASH part.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
To make our build faster we use hundreds of unity
>>>>>>>>>>>>>>>>>>>>>>>>>
build files (.cpp's with a lot of other .cpp's in them aka munch files)
but
>>>>>>>>>>>>>>>>>>>>>>>>>
still have a lot of single .cpp's as well (in total something like 3.4k
>>>>>>>>>>>>>>>>>>>>>>>>>
.obj files).
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
ps: sorry for sending to the wrong list, I was
>>>>>>>>>>>>>>>>>>>>>>>>>
reading about llvm mailing lists and jumped when I saw what I thought was a
>>>>>>>>>>>>>>>>>>>>>>>>>
lld exclusive list.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
A tool like this would be useful, yes.  We've
>>>>>>>>>>>>>>>>>>>>>>>>
talked about it internally as well and agreed it would be useful, we just
>>>>>>>>>>>>>>>>>>>>>>>>
haven't prioritized it.  If you're interested in submitting a patch
along
>>>>>>>>>>>>>>>>>>>>>>>>
those lines though, I think it would be a good addition.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
I'm not sure what the best place for it would be.
>>>>>>>>>>>>>>>>>>>>>>>>
llvm-readobj and llvm-objdump seem like obvious choices, but they are
>>>>>>>>>>>>>>>>>>>>>>>>
intended to be read-only, so perhaps they wouldn't be a good fit.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil is kind of a hodgepodge of everything
>>>>>>>>>>>>>>>>>>>>>>>>
else related to PDBs and symbols, so I wouldn't be opposed to making a new
>>>>>>>>>>>>>>>>>>>>>>>>
subcommand there called "ghash" or something that could process an
object
>>>>>>>>>>>>>>>>>>>>>>>>
file and output a new object file with a .debug$H section.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
A third option would be to make a new tool for it.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
I don't htink it would be that hard to write.  If
>>>>>>>>>>>>>>>>>>>>>>>>
you're interested in trying to make a patch for this, I can offer some
>>>>>>>>>>>>>>>>>>>>>>>>
guidance on where to look in the code.  Otherwise it's something that
we'll
>>>>>>>>>>>>>>>>>>>>>>>>
probably get to, I'm just not sure when.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I would love to write it and contribute it back,
>>>>>>>>>>>>>>>>>>>>>>>
please do tell, I did find some of the code of ghash in lld, but in fuzzy
>>>>>>>>>>>>>>>>>>>>>>>
on the llvm codeview part of it and never seen llvm-readobj/objdump or
>>>>>>>>>>>>>>>>>>>>>>>
llvm-pdbutil, but I'm not afraid to look :)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Luckily all of the important code is hidden behind
>>>>>>>>>>>>>>>>>>>>>>
library calls, and it should already just do the right thing, so I suspect
>>>>>>>>>>>>>>>>>>>>>>
you won't need to know much about CodeView to do this.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I think Peter has the right idea about putting this
>>>>>>>>>>>>>>>>>>>>>>
in llvm-objcopy.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
You can look at one of the existing CopyBinary
>>>>>>>>>>>>>>>>>>>>>>
functions there, which currently only work for ELF, but you can just make a
>>>>>>>>>>>>>>>>>>>>>>
new overload that accepts a COFFObjectFile.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
I would probably start by iterating over each of the
>>>>>>>>>>>>>>>>>>>>>>
sections (getNumberOfSections / getSectionName) looking for .debug$T and
>>>>>>>>>>>>>>>>>>>>>>
.debug$H sections.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
If you find a .debug$H section then you can just skip
>>>>>>>>>>>>>>>>>>>>>>
that object file.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
If you find a .debug$T but not a .debug$H, then
>>>>>>>>>>>>>>>>>>>>>>
basically do the same thing that LLD does in PDBLinker::mergeDebugT
>>>>>>>>>>>>>>>>>>>>>>
(create a CVTypeArray, and pass it to GloballyHashedType::hashTypes.  That
>>>>>>>>>>>>>>>>>>>>>>
will return an array of hash values.  (the format of .debug$H is the
>>>>>>>>>>>>>>>>>>>>>>
header, followed by the hash values).  Then when you're writing the list of
>>>>>>>>>>>>>>>>>>>>>>
sections, just add in the .debug$H section right after the .debug$T section.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Currently llvm-objcopy only writes ELF files, so it
>>>>>>>>>>>>>>>>>>>>>>
would need to be taught to write COFF files.  We have code to do this in
>>>>>>>>>>>>>>>>>>>>>>
the yaml2obj utility (specifically, in yaml2coff.cpp in the function
>>>>>>>>>>>>>>>>>>>>>>
writeCOFF).  There may be a way to move this code to somewhere else
>>>>>>>>>>>>>>>>>>>>>>
(llvm/Object/COFF.h?) so that it can be re-used by both yaml2coff and
>>>>>>>>>>>>>>>>>>>>>>
llvm-objcopy, but in the worst case scenario you could copy the code and
>>>>>>>>>>>>>>>>>>>>>>
re-write it to work with these new structures.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
Lastly, you'll probably want to put all of this
>>>>>>>>>>>>>>>>>>>>>>
behind an option in llvm-objcopy such as -add-codeview-ghash-section
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Leonardo Santagada
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Leonardo Santagada
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Leonardo
Santagada
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Leonardo Santagada
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Leonardo Santagada
>>>>
>>>
>>
>>
>> --
>>
>> Leonardo Santagada
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180125/35fd604f/attachment-0001.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jan 2018 - [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Maybe Matching Threads