Reid Kleckner via llvm-dev
2019-Feb-27 19:10 UTC
[llvm-dev] Making LLD PDB generation faster
This could be ICF. There were lots of issues with ICF on ARM64, but they are not inherently ARM64-specific, they just come up there more often. See https://reviews.llvm.org/D56986 which fixes that. Easiest thing is always to profile or add /time to see what's slow. On Wed, Feb 27, 2019 at 6:30 AM Leonardo Santagada <santagada at gmail.com> wrote:> Anyone would know why lld takes > 30 minutes to link lld without > symbols on release? > > The command line seems simple enough: > > C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\lld.rsp > /out:bin\lld.exe /implib:lib\lld.lib /version:0.0 /machine:x64 > -fuse-ld=lld /STACK:10000000 /INCREMENTAL:NO /subsystem:console > /MANIFEST /MANIFESTFILE:bin\lld.exe.manifest > > On Mon, Feb 25, 2019 at 8:20 PM Leonardo Santagada <santagada at gmail.com> > wrote: > > > > Sadly the patch on https://reviews.llvm.org/D55585 didn't apply on my > > clone of llvm at all :( It will take me quite some time to test this > > out. > > > > On Mon, Feb 25, 2019 at 5:08 PM Alexandre Ganea > > <alexandre.ganea at ubisoft.com> wrote: > > > > > > For enabling large memory pages, see this link: > https://support.sisoftware.co.uk/knowledgebase.php?article=52 > > > > > > Meow hash isn't in the patch I posted, but you can use xxHash, it is > good enough. Just add /hasher:xxhash to the LLD cmd-line. > > > > > > > > > -----Original Message----- > > > From: Leonardo Santagada <santagada at gmail.com> > > > Sent: Monday, February 25, 2019 11:05 AM > > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> > > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner <rnk at google.com>; > llvm-dev <llvm-dev at lists.llvm.org> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > > > Times for lld compiled with LTO: > > > > > > Input File Reading: 1430 ms ( 3.3%) > > > Code Layout: 486 ms ( 1.1%) > > > PDB Emission (Cumulative): 41042 ms ( 94.6%) > > > Add Objects: 33117 ms ( 76.4%) > > > Type Merging: 25861 ms ( 59.6%) > > > Symbol Merging: 7011 ms ( 16.2%) > > > TPI Stream Layout: 996 ms ( 2.3%) > > > Globals Stream Layout: 513 ms ( 1.2%) > > > Commit to Disk: 5175 ms ( 11.9%) > > > Commit Output File: 37 ms ( 0.1%) > > > ------------------------------------------------- > > > Total Link Time: 43366 ms (100.0%) > > > > > > LTO didn't help much :( > > > > > > Now I will try Alexandre patches and switch fo xxHash64 or meow > hashing. I need to discover how to enable huge pages on my windows > > > (1809) > > > > > > ps: Need to figure out how to limit the number of link jobs in ninja > as that almost used the whole 128GB of ram on my machine. On our > distributed build system we can limit linking jobs (which are the only > strict local jobs) to 8. > > > > > > On Mon, Feb 25, 2019 at 4:47 PM Alexandre Ganea < > alexandre.ganea at ubisoft.com> wrote: > > > > > > > > …however it is very slow to compile, because /MP isn’t currently > supported by clang-cl. So each CPP is compiled sequentially, one after > another. Thus my patch for adding /MP. > > > > > > > > > > > > > > > > From: Alexandre Ganea > > > > Sent: Monday, February 25, 2019 10:42 AM > > > > To: Zachary Turner <zturner at google.com>; Leonardo Santagada > > > > <santagada at gmail.com> > > > > Cc: Reid Kleckner <rnk at google.com>; llvm-dev < > llvm-dev at lists.llvm.org> > > > > Subject: RE: [llvm-dev] Making LLD PDB generation faster > > > > > > > > > > > > > > > > Yes, -Tllvm works. > > > > > > > > > > > > > > > > > > > > > > > > From: Zachary Turner <zturner at google.com> > > > > Sent: Monday, February 25, 2019 10:36 AM > > > > To: Leonardo Santagada <santagada at gmail.com> > > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner > > > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> > > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > > > > > > > > > > > > > Is -Tllvm even supported? I thought the only thing you could pass for > > > > -T was -Thost=x64 > > > > > > > > On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada < > santagada at gmail.com> wrote: > > > > > > > > I think its a huge bug that it doesn't raise any errors or warnings > > > > about it. But I will open a ticket on cmake, they should be using > > > > clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 bit > > > > as well. > > > > > > > > On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com> > wrote: > > > > > > > > > > I don’t think changing the compiler or linker is supported with the > > > > > vs generator, but I also don’t think it’s a bug On Mon, Feb 25, > 2019 at 6:31 AM Alexandre Ganea <alexandre.ganea at ubisoft.com> wrote: > > > > >> > > > > >> Can you please try using Ninja instead? > > > > >> > > > > >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release > > > > >> -DLLVM_OPTIMIZED_TABLEGEN=true > > > > >> -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld > > > > >> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true > > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > > >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" > > > > >> -DLLVM_ENABLE_PDB=true > > > > >> > > > > >> It will be faster to compile. The setup I use is the above Ninja > cmd-line for compiling optimized builds; and in addition, I keep the Visual > Studio generator, as you do, but only for having a .sln to debug. It is a > bit annoying to cmake twice, in two different build folders, but you can > write a batch script. > > > > >> > > > > >> If the above works, maybe you should log the bug on > https://bugs.llvm.org/ so it is not forgotten. > > > > >> > > > > >> Alex. > > > > >> > > > > >> -----Original Message----- > > > > >> From: Leonardo Santagada <santagada at gmail.com> > > > > >> Sent: Monday, February 25, 2019 9:04 AM > > > > >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com> > > > > >> Cc: Zachary Turner <zturner at google.com>; Reid Kleckner > > > > >> <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> > > > > >> Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > >> > > > > >> Ok so there's a lot of confusion on cmake regarding using llvm as > a toolset. It still does all its checks against cl.exe (not clang-cl) and > somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of places > including: > > > > >> > > > > >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 > -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" > > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > > >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true > > > > >> -DLLVM_ENABLE_PROJECTS=lld ../llvm > > > > >> > > > > >> but it seems like the generator overrides it. > > > > >> > > > > >> > > > > >> ps: Created a phabricator account > > > > >> > > > > >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea < > alexandre.ganea at ubisoft.com> wrote: > > > > >> > > > > > >> > That's good news. For having debug info, you could try adding > /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the > 'RelWithDebInfo' target instead of 'Release' and add > -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default). > > > > >> > > > > > >> > Can you please send a patch on Phabricator if you fix the > LLVM_ENABLE_PDB issue with Clang? The goal is to have performance > out-of-the-box. > > > > >> > > > > > >> > Alex. > > > > >> > > > > > >> > -----Original Message----- > > > > >> > From: Leonardo Santagada <santagada at gmail.com> > > > > >> > Sent: Monday, February 25, 2019 7:36 AM > > > > >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> > > > > >> > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner > > > > >> > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> > > > > >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > >> > > > > > >> > With your patch for cmake and reconfiguring it with "cmake -G > "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true > -DLLVM_ENABLE_PROJECTS=lld ../llvm" we get these results: > > > > >> > > > > > >> > Input File Reading: 1602 ms ( 3.5%) > > > > >> > Code Layout: 493 ms ( 1.1%) > > > > >> > PDB Emission (Cumulative): 43127 ms ( 94.5%) > > > > >> > Add Objects: 34577 ms ( 75.8%) > > > > >> > Type Merging: 26709 ms ( 58.5%) > > > > >> > Symbol Merging: 7598 ms ( 16.7%) > > > > >> > TPI Stream Layout: 1107 ms ( 2.4%) > > > > >> > Globals Stream Layout: 602 ms ( 1.3%) > > > > >> > Commit to Disk: 5636 ms ( 12.4%) > > > > >> > Commit Output File: 16 ms ( 0.0%) > > > > >> > ------------------------------------------------- > > > > >> > Total Link Time: 45626 ms (100.0%) > > > > >> > > > > > >> > Unfortunately there were no pdb generated with lld.exe (or any > > > > >> > other > > > > >> > binaries) so I can't debug them. It seems like LLVM_ENABLE_PDB > is not made to support using clang to complie itself as it tries to att /Zi > to the targets instead of /Z7 and global hashes. I can patch it over here, > but we probably want to fix this in cmake and on the docs, as its not clear > at all how to compile lld in a performance 64bit way. > > > > >> > > > > > >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea < > alexandre.ganea at ubisoft.com> wrote: > > > > >> > > > > > > >> > > How do you compile LLD? There's a big difference between when > > > > >> > > using MSVC vs Clang. The parallel ghash patch I was mentioning > > > > >> > > is almost 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I > > > > >> > > don't know exactly why. I also suggest you use the Release > target. You should also grab this patch: > > > > >> > > https://reviews.llvm.org/D55056 - I had to revert it because > it > > > > >> > > was causing issues with LLDB. But it will give an improvement > for LLD. > > > > >> > > Please let me know if that improves your timings. > > > > >> > > > > > > >> > > The page faults are probably the OS loading from disk: most, > if > > > > >> > > not all the files are accessed by LLD by mmap'ing them. > > > > >> > > > > > > >> > > The lockless DenseHash I was talking about will be published > in > > > > >> > > an upcoming patch. As for reproducibility, this can be an > issue > > > > >> > > on build systems. But on local machines, we could explicitly > > > > >> > > state that we want non-deterministic builds, through some > cmd-line flag. If your 57sec for "Type Merging" > > > > >> > > transforms into 5sec when non-deterministic, I think that's > worth it. > > > > >> > > > > > > >> > > Alex. > > > > >> > > > > > > >> > > -----Original Message----- > > > > >> > > From: Leonardo Santagada <santagada at gmail.com> > > > > >> > > Sent: Sunday, February 24, 2019 6:43 PM > > > > >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> > > > > >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner > > > > >> > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> > > > > >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > >> > > > > > > >> > > More info inline, I think there is a couple of misconceptions > on what I'm doing: > > > > >> > > > > > > >> > > 1) I already patch all my .obj files to contain .debug$H > > > > >> > > entries so it is all ghashed already > > > > >> > > 2) All the 35s is spent adding to the DenseMap > > > > >> > > > > > > >> > > Here is my current times (lld-link.exe compiled with -O2 so > no lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file: > > > > >> > > > > > > >> > > Input File Reading: 1724 ms ( 2.1%) > > > > >> > > Code Layout: 482 ms ( 0.6%) > > > > >> > > PDB Emission (Cumulative): 79261 ms ( 96.8%) > > > > >> > > Add Objects: 68650 ms ( 83.8%) > > > > >> > > Type Merging: 57534 ms ( 70.2%) > > > > >> > > Symbol Merging: 10822 ms ( 13.2%) > > > > >> > > TPI Stream Layout: 1501 ms ( 1.8%) > > > > >> > > Globals Stream Layout: 770 ms ( 0.9%) > > > > >> > > Commit to Disk: 7007 ms ( 8.6%) > > > > >> > > Commit Output File: 19 ms ( 0.0%) > > > > >> > > ------------------------------------------------- > > > > >> > > Total Link Time: 81900 ms (100.0%) > > > > >> > > > > > > >> > > Our target is for < 20 seconds linking, anything bellow 40 > seconds would be ok. Ideal times would be around 8s (in which it will > mostly beat link.exe incremental linking). > > > > >> > > > > > > >> > > My tip for profiling is using superluminal > > > > >> > > (https://www.superluminal.eu/) its the easiest way to see > everything your code is doing. > > > > >> > > > > > > >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea < > alexandre.ganea at ubisoft.com> wrote: > > > > >> > > > > > > > >> > > > Leonardo, to answer to your questions, yes to all of them J > > > > >> > > > You can take a > > > > >> > > > > > > > >> > > > look at this prototype/proposal: > > > > >> > > > https://reviews.llvm.org/D55585 > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > Overall, computing ghashes in parallel at link-time and > > > > >> > > > merging Types with them > > > > >> > > > > > > > >> > > > is less costly that the current approach to merging. The > > > > >> > > > 35sec you’re seeing > > > > >> > > > > > > > >> > > > for merging should go down to about 15sec. > > > > >> > > > > > > >> > > I don't do much computing of ghashes as we already preprocess > all .obj files from msvc to add a .debug$H to them. The whole 35 seconds is > spent in just densehash findbucket function. The rest of the time is mostly > pagefaults (I guess to load in obj data and to grow the final pdb?). > > > > >> > > > > > > >> > > > The patch doesn’t parallelize > > > > >> > > > > > > > >> > > > (yet) the Type merging itself, but we have an alternate > > > > >> > > > multithread-suitable > > > > >> > > > > > > > >> > > > implementation of DenseHash which already supports lockless, > > > > >> > > > wait-free, > > > > >> > > > > > > > >> > > > insert/fetch/resize. > > > > >> > > > > > > >> > > Where is this lockless densehash? This is the part were I > would love to help, but if there is a densehash it is probably just > creating the threads and letting them merge the results. I'm a bit afraid > of reproduceability of builds, but as we already don't have that with > link.exe we are not really loosing anything. > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > >> > > > The prototype allows for testing different hashing > > > > >> > > > algorithms, and indeed > > > > >> > > > > > > > >> > > > xxHash seems to be the best general-purpose choice. I’ve > also > > > > >> > > > added support > > > > >> > > > > > > > >> > > > for more specialized hardware-based hashes, like Casey > > > > >> > > > Muratori’s Meow Hash > > > > >> > > > > > > > >> > > > (uses hardware AES SSE 4.2 instructions), which brings the > figures down a bit. > > > > >> > > > > > > > >> > > > > > > >> > > I remembered Meow hashes needing at least k bytes of data, > but looking at their website right now there is no mention of it. Hashing > performance isn't much of an impact as we do it per .obj file distributed > through our company so the time to calculate those are completely > distributed. > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > >> > > > Future changes could write back the computed ghash stream > > > > >> > > > back to OBJs if > > > > >> > > > > > > > >> > > > /INCREMENTAL is specified (just an idea). Incrementally > > > > >> > > > linking will be faster > > > > >> > > > > > > > >> > > > that way when working with MSVC OBJs. > > > > >> > > > > > > > >> > > > > > > >> > > I already have a patch for llvm-objcopy that adds a > > > > >> > > -add-ghashes option that does this, I will be cleaning it up > > > > >> > > this week and sending a PR for it > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > >> > > > As for creating PDBs for independent projects, that would > help most likely. > > > > >> > > > > > > > >> > > > However the ghash stream would need to be stored in the PDB > > > > >> > > > in that case > > > > >> > > > > > > > >> > > > (currently, ghashes are dropped after merging). That could > > > > >> > > > help when using > > > > >> > > > > > > > >> > > > rarely compiled projects, used along with network caches. > > > > >> > > > > > > >> > > I meant actually a .lib, with all the obj files inside plus a > merged .debug$H entry. No pdb generation or changes necessary, we just run > the same code that merges types in lld and do that a the librarian level. > > > > >> > > > > > > >> > > > > > > > >> > > > I will start sending smaller patches to converge towards the > > > > >> > > > functionally of > > > > >> > > > > > > > >> > > > the prototype above. > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > Best, > > > > >> > > > > > > > >> > > > Alex. > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > From: Zachary Turner <zturner at google.com> > > > > >> > > > Sent: Sunday, February 24, 2019 1:20 AM > > > > >> > > > To: Leonardo Santagada <santagada at gmail.com> > > > > >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid > > > > >> > > > Kleckner <rnk at google.com>; llvm-dev < > llvm-dev at lists.llvm.org> > > > > >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > +Reid and Alexandre, who have been doing work in this area > > > > >> > > > +recently > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via > llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > >> > > > > > > > >> > > > Hi, > > > > >> > > > > > > > >> > > > Is anyone working on making the PDB generation on LLD > faster? > > > > >> > > > Looking of a trace for linking one of our binaries (it takes > > > > >> > > > 1min6s-1min20s) I see two things: > > > > >> > > > > > > > >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so > > > > >> > > > almost half of the time of linking, mostly finding > duplicates > > > > >> > > > 2) There is no parallelization inside of addObjectsToPDB > > > > >> > > > > > > > >> > > > Is anyone working on those? Also has anyone thought about > > > > >> > > > merging .obj files to deduplicate type infomation so we can > > > > >> > > > do the linking on projects to generate something like a lib > > > > >> > > > file, but deduplicated debug information (as far as I know > > > > >> > > > actual .lib just put all pdbs or > > > > >> > > > /Z7 debug info inside a file without dedup). > > > > >> > > > > > > > >> > > > Just looking at the code it seems it is much more mature and > > > > >> > > > also the choice of SHA1_8 seems interesting (still don't > know > > > > >> > > > why not use xxHash64). > > > > >> > > > > > > > >> > > > ps: My code to add ghashes to msvc compiled .obj files is > > > > >> > > > almost ready to be pushed as an option for llvm-objcopy. > > > > >> > > > > > > > >> > > > -- > > > > >> > > > > > > > >> > > > Leonardo Santagada > > > > >> > > > _______________________________________________ > > > > >> > > > LLVM Developers mailing list > > > > >> > > > llvm-dev at lists.llvm.org > > > > >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > -- > > > > >> > > > > > > >> > > Leonardo Santagada > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > > > > > >> > Leonardo Santagada > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> > > > > >> Leonardo Santagada > > > > > > > > > > > > > > > > -- > > > > > > > > Leonardo Santagada > > > > > > > > > > > > -- > > > > > > Leonardo Santagada > > > > > > > > -- > > > > Leonardo Santagada > > > > -- > > Leonardo Santagada >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190227/f21a031e/attachment.html>
Leonardo Santagada via llvm-dev
2019-Feb-27 23:17 UTC
[llvm-dev] Making LLD PDB generation faster
My problem was that some library was still built with lto and I think that forces lld to do lto, but contrary to msvc it doesn't do any warning about it. I think we are going to try sharding the ghashes to multiple threads and have a hashmap that only contains the index to a list of seen types. This way we hope there's no need for any locking. Also we are investigating why we have 420 million types being linked while it appears that 95-99 % of them are not being used. De anyone know if pch can help here? My feeling is not much as template instantiation still generates a ton of weak symbols on the pch users, but I might be confused. Also another idea is to create lib files with types and hashes merged, where is llvm-lib source at? It seems funny but I don't seem to find it anywhere. Ps: the cmake bug for using llvm in visual studio projects has been fixed upstream. On Wed, 27 Feb 2019 at 20:10, Reid Kleckner <rnk at google.com> wrote:> This could be ICF. There were lots of issues with ICF on ARM64, but they > are not inherently ARM64-specific, they just come up there more often. See > https://reviews.llvm.org/D56986 which fixes that. > > Easiest thing is always to profile or add /time to see what's slow. > > On Wed, Feb 27, 2019 at 6:30 AM Leonardo Santagada <santagada at gmail.com> > wrote: > >> Anyone would know why lld takes > 30 minutes to link lld without >> symbols on release? >> >> The command line seems simple enough: >> >> C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\lld.rsp >> /out:bin\lld.exe /implib:lib\lld.lib /version:0.0 /machine:x64 >> -fuse-ld=lld /STACK:10000000 /INCREMENTAL:NO /subsystem:console >> /MANIFEST /MANIFESTFILE:bin\lld.exe.manifest >> >> On Mon, Feb 25, 2019 at 8:20 PM Leonardo Santagada <santagada at gmail.com> >> wrote: >> > >> > Sadly the patch on https://reviews.llvm.org/D55585 didn't apply on my >> > clone of llvm at all :( It will take me quite some time to test this >> > out. >> > >> > On Mon, Feb 25, 2019 at 5:08 PM Alexandre Ganea >> > <alexandre.ganea at ubisoft.com> wrote: >> > > >> > > For enabling large memory pages, see this link: >> https://support.sisoftware.co.uk/knowledgebase.php?article=52 >> > > >> > > Meow hash isn't in the patch I posted, but you can use xxHash, it is >> good enough. Just add /hasher:xxhash to the LLD cmd-line. >> > > >> > > >> > > -----Original Message----- >> > > From: Leonardo Santagada <santagada at gmail.com> >> > > Sent: Monday, February 25, 2019 11:05 AM >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner < >> rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >> > > >> > > Times for lld compiled with LTO: >> > > >> > > Input File Reading: 1430 ms ( 3.3%) >> > > Code Layout: 486 ms ( 1.1%) >> > > PDB Emission (Cumulative): 41042 ms ( 94.6%) >> > > Add Objects: 33117 ms ( 76.4%) >> > > Type Merging: 25861 ms ( 59.6%) >> > > Symbol Merging: 7011 ms ( 16.2%) >> > > TPI Stream Layout: 996 ms ( 2.3%) >> > > Globals Stream Layout: 513 ms ( 1.2%) >> > > Commit to Disk: 5175 ms ( 11.9%) >> > > Commit Output File: 37 ms ( 0.1%) >> > > ------------------------------------------------- >> > > Total Link Time: 43366 ms (100.0%) >> > > >> > > LTO didn't help much :( >> > > >> > > Now I will try Alexandre patches and switch fo xxHash64 or meow >> hashing. I need to discover how to enable huge pages on my windows >> > > (1809) >> > > >> > > ps: Need to figure out how to limit the number of link jobs in ninja >> as that almost used the whole 128GB of ram on my machine. On our >> distributed build system we can limit linking jobs (which are the only >> strict local jobs) to 8. >> > > >> > > On Mon, Feb 25, 2019 at 4:47 PM Alexandre Ganea < >> alexandre.ganea at ubisoft.com> wrote: >> > > > >> > > > …however it is very slow to compile, because /MP isn’t currently >> supported by clang-cl. So each CPP is compiled sequentially, one after >> another. Thus my patch for adding /MP. >> > > > >> > > > >> > > > >> > > > From: Alexandre Ganea >> > > > Sent: Monday, February 25, 2019 10:42 AM >> > > > To: Zachary Turner <zturner at google.com>; Leonardo Santagada >> > > > <santagada at gmail.com> >> > > > Cc: Reid Kleckner <rnk at google.com>; llvm-dev < >> llvm-dev at lists.llvm.org> >> > > > Subject: RE: [llvm-dev] Making LLD PDB generation faster >> > > > >> > > > >> > > > >> > > > Yes, -Tllvm works. >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > From: Zachary Turner <zturner at google.com> >> > > > Sent: Monday, February 25, 2019 10:36 AM >> > > > To: Leonardo Santagada <santagada at gmail.com> >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner >> > > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >> > > > >> > > > >> > > > >> > > > Is -Tllvm even supported? I thought the only thing you could pass >> for >> > > > -T was -Thost=x64 >> > > > >> > > > On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada < >> santagada at gmail.com> wrote: >> > > > >> > > > I think its a huge bug that it doesn't raise any errors or warnings >> > > > about it. But I will open a ticket on cmake, they should be using >> > > > clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 >> bit >> > > > as well. >> > > > >> > > > On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com> >> wrote: >> > > > > >> > > > > I don’t think changing the compiler or linker is supported with >> the >> > > > > vs generator, but I also don’t think it’s a bug On Mon, Feb 25, >> 2019 at 6:31 AM Alexandre Ganea <alexandre.ganea at ubisoft.com> wrote: >> > > > >> >> > > > >> Can you please try using Ninja instead? >> > > > >> >> > > > >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release >> > > > >> -DLLVM_OPTIMIZED_TABLEGEN=true >> > > > >> -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld >> > > > >> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true >> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >> > > > >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" >> > > > >> -DLLVM_ENABLE_PDB=true >> > > > >> >> > > > >> It will be faster to compile. The setup I use is the above Ninja >> cmd-line for compiling optimized builds; and in addition, I keep the Visual >> Studio generator, as you do, but only for having a .sln to debug. It is a >> bit annoying to cmake twice, in two different build folders, but you can >> write a batch script. >> > > > >> >> > > > >> If the above works, maybe you should log the bug on >> https://bugs.llvm.org/ so it is not forgotten. >> > > > >> >> > > > >> Alex. >> > > > >> >> > > > >> -----Original Message----- >> > > > >> From: Leonardo Santagada <santagada at gmail.com> >> > > > >> Sent: Monday, February 25, 2019 9:04 AM >> > > > >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >> > > > >> Cc: Zachary Turner <zturner at google.com>; Reid Kleckner >> > > > >> <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >> > > > >> Subject: Re: [llvm-dev] Making LLD PDB generation faster >> > > > >> >> > > > >> Ok so there's a lot of confusion on cmake regarding using llvm >> as a toolset. It still does all its checks against cl.exe (not clang-cl) >> and somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of >> places including: >> > > > >> >> > > > >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" >> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >> > > > >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true >> > > > >> -DLLVM_ENABLE_PROJECTS=lld ../llvm >> > > > >> >> > > > >> but it seems like the generator overrides it. >> > > > >> >> > > > >> >> > > > >> ps: Created a phabricator account >> > > > >> >> > > > >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea < >> alexandre.ganea at ubisoft.com> wrote: >> > > > >> > >> > > > >> > That's good news. For having debug info, you could try adding >> /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the >> 'RelWithDebInfo' target instead of 'Release' and add >> -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default). >> > > > >> > >> > > > >> > Can you please send a patch on Phabricator if you fix the >> LLVM_ENABLE_PDB issue with Clang? The goal is to have performance >> out-of-the-box. >> > > > >> > >> > > > >> > Alex. >> > > > >> > >> > > > >> > -----Original Message----- >> > > > >> > From: Leonardo Santagada <santagada at gmail.com> >> > > > >> > Sent: Monday, February 25, 2019 7:36 AM >> > > > >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >> > > > >> > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner >> > > > >> > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >> > > > >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster >> > > > >> > >> > > > >> > With your patch for cmake and reconfiguring it with "cmake -G >> "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true >> -DLLVM_ENABLE_PROJECTS=lld ../llvm" we get these results: >> > > > >> > >> > > > >> > Input File Reading: 1602 ms ( 3.5%) >> > > > >> > Code Layout: 493 ms ( 1.1%) >> > > > >> > PDB Emission (Cumulative): 43127 ms ( 94.5%) >> > > > >> > Add Objects: 34577 ms ( 75.8%) >> > > > >> > Type Merging: 26709 ms ( 58.5%) >> > > > >> > Symbol Merging: 7598 ms ( 16.7%) >> > > > >> > TPI Stream Layout: 1107 ms ( 2.4%) >> > > > >> > Globals Stream Layout: 602 ms ( 1.3%) >> > > > >> > Commit to Disk: 5636 ms ( 12.4%) >> > > > >> > Commit Output File: 16 ms ( 0.0%) >> > > > >> > ------------------------------------------------- >> > > > >> > Total Link Time: 45626 ms (100.0%) >> > > > >> > >> > > > >> > Unfortunately there were no pdb generated with lld.exe (or any >> > > > >> > other >> > > > >> > binaries) so I can't debug them. It seems like LLVM_ENABLE_PDB >> is not made to support using clang to complie itself as it tries to att /Zi >> to the targets instead of /Z7 and global hashes. I can patch it over here, >> but we probably want to fix this in cmake and on the docs, as its not clear >> at all how to compile lld in a performance 64bit way. >> > > > >> > >> > > > >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea < >> alexandre.ganea at ubisoft.com> wrote: >> > > > >> > > >> > > > >> > > How do you compile LLD? There's a big difference between when >> > > > >> > > using MSVC vs Clang. The parallel ghash patch I was >> mentioning >> > > > >> > > is almost 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I >> > > > >> > > don't know exactly why. I also suggest you use the Release >> target. You should also grab this patch: >> > > > >> > > https://reviews.llvm.org/D55056 - I had to revert it >> because it >> > > > >> > > was causing issues with LLDB. But it will give an >> improvement for LLD. >> > > > >> > > Please let me know if that improves your timings. >> > > > >> > > >> > > > >> > > The page faults are probably the OS loading from disk: most, >> if >> > > > >> > > not all the files are accessed by LLD by mmap'ing them. >> > > > >> > > >> > > > >> > > The lockless DenseHash I was talking about will be published >> in >> > > > >> > > an upcoming patch. As for reproducibility, this can be an >> issue >> > > > >> > > on build systems. But on local machines, we could explicitly >> > > > >> > > state that we want non-deterministic builds, through some >> cmd-line flag. If your 57sec for "Type Merging" >> > > > >> > > transforms into 5sec when non-deterministic, I think that's >> worth it. >> > > > >> > > >> > > > >> > > Alex. >> > > > >> > > >> > > > >> > > -----Original Message----- >> > > > >> > > From: Leonardo Santagada <santagada at gmail.com> >> > > > >> > > Sent: Sunday, February 24, 2019 6:43 PM >> > > > >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >> > > > >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner >> > > > >> > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >> > > > >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >> > > > >> > > >> > > > >> > > More info inline, I think there is a couple of >> misconceptions on what I'm doing: >> > > > >> > > >> > > > >> > > 1) I already patch all my .obj files to contain .debug$H >> > > > >> > > entries so it is all ghashed already >> > > > >> > > 2) All the 35s is spent adding to the DenseMap >> > > > >> > > >> > > > >> > > Here is my current times (lld-link.exe compiled with -O2 so >> no lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file: >> > > > >> > > >> > > > >> > > Input File Reading: 1724 ms ( 2.1%) >> > > > >> > > Code Layout: 482 ms ( 0.6%) >> > > > >> > > PDB Emission (Cumulative): 79261 ms ( 96.8%) >> > > > >> > > Add Objects: 68650 ms ( 83.8%) >> > > > >> > > Type Merging: 57534 ms ( 70.2%) >> > > > >> > > Symbol Merging: 10822 ms ( 13.2%) >> > > > >> > > TPI Stream Layout: 1501 ms ( 1.8%) >> > > > >> > > Globals Stream Layout: 770 ms ( 0.9%) >> > > > >> > > Commit to Disk: 7007 ms ( 8.6%) >> > > > >> > > Commit Output File: 19 ms ( 0.0%) >> > > > >> > > ------------------------------------------------- >> > > > >> > > Total Link Time: 81900 ms (100.0%) >> > > > >> > > >> > > > >> > > Our target is for < 20 seconds linking, anything bellow 40 >> seconds would be ok. Ideal times would be around 8s (in which it will >> mostly beat link.exe incremental linking). >> > > > >> > > >> > > > >> > > My tip for profiling is using superluminal >> > > > >> > > (https://www.superluminal.eu/) its the easiest way to see >> everything your code is doing. >> > > > >> > > >> > > > >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea < >> alexandre.ganea at ubisoft.com> wrote: >> > > > >> > > > >> > > > >> > > > Leonardo, to answer to your questions, yes to all of them J >> > > > >> > > > You can take a >> > > > >> > > > >> > > > >> > > > look at this prototype/proposal: >> > > > >> > > > https://reviews.llvm.org/D55585 >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > Overall, computing ghashes in parallel at link-time and >> > > > >> > > > merging Types with them >> > > > >> > > > >> > > > >> > > > is less costly that the current approach to merging. The >> > > > >> > > > 35sec you’re seeing >> > > > >> > > > >> > > > >> > > > for merging should go down to about 15sec. >> > > > >> > > >> > > > >> > > I don't do much computing of ghashes as we already >> preprocess all .obj files from msvc to add a .debug$H to them. The whole 35 >> seconds is spent in just densehash findbucket function. The rest of the >> time is mostly pagefaults (I guess to load in obj data and to grow the >> final pdb?). >> > > > >> > > >> > > > >> > > > The patch doesn’t parallelize >> > > > >> > > > >> > > > >> > > > (yet) the Type merging itself, but we have an alternate >> > > > >> > > > multithread-suitable >> > > > >> > > > >> > > > >> > > > implementation of DenseHash which already supports >> lockless, >> > > > >> > > > wait-free, >> > > > >> > > > >> > > > >> > > > insert/fetch/resize. >> > > > >> > > >> > > > >> > > Where is this lockless densehash? This is the part were I >> would love to help, but if there is a densehash it is probably just >> creating the threads and letting them merge the results. I'm a bit afraid >> of reproduceability of builds, but as we already don't have that with >> link.exe we are not really loosing anything. >> > > > >> > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > The prototype allows for testing different hashing >> > > > >> > > > algorithms, and indeed >> > > > >> > > > >> > > > >> > > > xxHash seems to be the best general-purpose choice. I’ve >> also >> > > > >> > > > added support >> > > > >> > > > >> > > > >> > > > for more specialized hardware-based hashes, like Casey >> > > > >> > > > Muratori’s Meow Hash >> > > > >> > > > >> > > > >> > > > (uses hardware AES SSE 4.2 instructions), which brings the >> figures down a bit. >> > > > >> > > > >> > > > >> > > >> > > > >> > > I remembered Meow hashes needing at least k bytes of data, >> but looking at their website right now there is no mention of it. Hashing >> performance isn't much of an impact as we do it per .obj file distributed >> through our company so the time to calculate those are completely >> distributed. >> > > > >> > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > Future changes could write back the computed ghash stream >> > > > >> > > > back to OBJs if >> > > > >> > > > >> > > > >> > > > /INCREMENTAL is specified (just an idea). Incrementally >> > > > >> > > > linking will be faster >> > > > >> > > > >> > > > >> > > > that way when working with MSVC OBJs. >> > > > >> > > > >> > > > >> > > >> > > > >> > > I already have a patch for llvm-objcopy that adds a >> > > > >> > > -add-ghashes option that does this, I will be cleaning it up >> > > > >> > > this week and sending a PR for it >> > > > >> > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > As for creating PDBs for independent projects, that would >> help most likely. >> > > > >> > > > >> > > > >> > > > However the ghash stream would need to be stored in the PDB >> > > > >> > > > in that case >> > > > >> > > > >> > > > >> > > > (currently, ghashes are dropped after merging). That could >> > > > >> > > > help when using >> > > > >> > > > >> > > > >> > > > rarely compiled projects, used along with network caches. >> > > > >> > > >> > > > >> > > I meant actually a .lib, with all the obj files inside plus >> a merged .debug$H entry. No pdb generation or changes necessary, we just >> run the same code that merges types in lld and do that a the librarian >> level. >> > > > >> > > >> > > > >> > > > >> > > > >> > > > I will start sending smaller patches to converge towards >> the >> > > > >> > > > functionally of >> > > > >> > > > >> > > > >> > > > the prototype above. >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > Best, >> > > > >> > > > >> > > > >> > > > Alex. >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > From: Zachary Turner <zturner at google.com> >> > > > >> > > > Sent: Sunday, February 24, 2019 1:20 AM >> > > > >> > > > To: Leonardo Santagada <santagada at gmail.com> >> > > > >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid >> > > > >> > > > Kleckner <rnk at google.com>; llvm-dev < >> llvm-dev at lists.llvm.org> >> > > > >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > +Reid and Alexandre, who have been doing work in this area >> > > > >> > > > +recently >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via >> llvm-dev <llvm-dev at lists.llvm.org> wrote: >> > > > >> > > > >> > > > >> > > > Hi, >> > > > >> > > > >> > > > >> > > > Is anyone working on making the PDB generation on LLD >> faster? >> > > > >> > > > Looking of a trace for linking one of our binaries (it >> takes >> > > > >> > > > 1min6s-1min20s) I see two things: >> > > > >> > > > >> > > > >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so >> > > > >> > > > almost half of the time of linking, mostly finding >> duplicates >> > > > >> > > > 2) There is no parallelization inside of addObjectsToPDB >> > > > >> > > > >> > > > >> > > > Is anyone working on those? Also has anyone thought about >> > > > >> > > > merging .obj files to deduplicate type infomation so we can >> > > > >> > > > do the linking on projects to generate something like a lib >> > > > >> > > > file, but deduplicated debug information (as far as I know >> > > > >> > > > actual .lib just put all pdbs or >> > > > >> > > > /Z7 debug info inside a file without dedup). >> > > > >> > > > >> > > > >> > > > Just looking at the code it seems it is much more mature >> and >> > > > >> > > > also the choice of SHA1_8 seems interesting (still don't >> know >> > > > >> > > > why not use xxHash64). >> > > > >> > > > >> > > > >> > > > ps: My code to add ghashes to msvc compiled .obj files is >> > > > >> > > > almost ready to be pushed as an option for llvm-objcopy. >> > > > >> > > > >> > > > >> > > > -- >> > > > >> > > > >> > > > >> > > > Leonardo Santagada >> > > > >> > > > _______________________________________________ >> > > > >> > > > LLVM Developers mailing list >> > > > >> > > > llvm-dev at lists.llvm.org >> > > > >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > -- >> > > > >> > > >> > > > >> > > Leonardo Santagada >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > -- >> > > > >> > >> > > > >> > Leonardo Santagada >> > > > >> >> > > > >> >> > > > >> >> > > > >> -- >> > > > >> >> > > > >> Leonardo Santagada >> > > > >> > > > >> > > > >> > > > -- >> > > > >> > > > Leonardo Santagada >> > > >> > > >> > > >> > > -- >> > > >> > > Leonardo Santagada >> > >> > >> > >> > -- >> > >> > Leonardo Santagada >> >> >> >> -- >> >> Leonardo Santagada >> > --Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190228/1601d2f4/attachment-0001.html>
Zachary Turner via llvm-dev
2019-Feb-27 23:27 UTC
[llvm-dev] Making LLD PDB generation faster
On Wed, Feb 27, 2019 at 3:17 PM Leonardo Santagada <santagada at gmail.com> wrote:> > Also we are investigating why we have 420 million types being linked while > it appears that 95-99 % of them are not being used. De anyone know if pch > can help here? My feeling is not much as template instantiation still > generates a ton of weak symbols on the pch users, but I might be confused. >This probably has to do with the fact that most types are duplicates. If there is a class Foo in a header file, and you include that header file in 100 different translation units, all 100 of them of them will get full type information for that class in its object file. This is the /Z7 semantics that clang-cl implements. The alternative is /Zi which uses a type server (out of process de-duplicater) to do this merging at compile time, which is what cl uses by default, and which clang-cl doesn't support. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190227/c75d8485/attachment.html>
Alexandre Ganea via llvm-dev
2019-Feb-28 00:13 UTC
[llvm-dev] Making LLD PDB generation faster
As for multithreaded ghashes: Even if the hashtable stores 32-bit indices to SeenHashes, you would still need to compare the ghashes for collisions: https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ADT/DenseMap.h#L627 Finding the 32-bit index in the hashtable doesn’t necessarily mean it’s the right one. The following table shows the collision distribution when inserting (type) ghashes into the DenseMap. This is for a farily large EXE, comparable to yours I suppose. This shows that 65% of buckets are hit (inserted or found) on the first bucket accessed. But there’s still 35% which requires querying more buckets in the hashtable, up to 54 buckets. 1 134,994,464 65.551% 2 35,478,867 17.228% 3 15,658,999 7.604% 4 8,045,798 3.907% 5 4,540,451 2.205% 6 2,634,179 1.279% 7 1,608,599 0.781% 8 1,007,705 0.489% 9 643,471 0.312% 10 418,645 0.203% 11 279816 0.136% 12 189733 0.092% 13 129686 0.063% 14 90484 0.044% 15 62863 0.031% 16 43584 0.021% 17 31240 0.015% 18 22180 0.011% 19 15850 0.008% 20 11266 0.005% 21 8171 0.004% 22 5900 0.003% 23 4379 0.002% 24 3167 0.002% 25 2316 0.001% 26 1681 0.001% 27 1185 0.001% 28 901 0.000% 29 638 0.000% 30 465 0.000% 31 367 0.000% 32 280 0.000% 33 189 0.000% 34 140 0.000% 35 106 0.000% 36 76 0.000% 37 55 0.000% 38 37 0.000% 39 27 0.000% 40 20 0.000% 41 18 0.000% 42 15 0.000% 43 13 0.000% 44 10 0.000% 45 6 0.000% 46 6 0.000% 47 5 0.000% 48 4 0.000% 49 3 0.000% 50 3 0.000% 51 2 0.000% 52 2 0.000% 53 2 0.000% 54 2 0.000% 205,938,071 And here is the cache miss distribution, with 8-byte buckets (value and hash), as implemented in https://reviews.llvm.org/D55585#change-nIfiq2fvl33C So about 86% of hits (insertions or fetches) will be found on the first cacheline accessed, for any given Type record inserted. 1 177774132 86.324% 2 20133777 9.777% 3 4046867 1.965% 4 1506067 0.731% 5 829202 0.403% 6 533119 0.259% 7 348794 0.169% 8 233626 0.113% 9 159738 0.078% 10 110651 0.054% 11 76223 0.037% 12 53601 0.026% 13 37271 0.018% 14 26666 0.013% 15 19013 0.009% 16 13591 0.007% 17 9667 0.005% 18 7024 0.003% 19 5102 0.002% 20 3780 0.002% 21 2767 0.001% 22 2003 0.001% 23 1423 0.001% 24 1041 0.001% 25 780 0.000% 26 565 0.000% 27 411 0.000% 28 315 0.000% 29 228 0.000% 30 164 0.000% 31 117 0.000% 32 91 0.000% 33 61 0.000% 34 42 0.000% 35 30 0.000% 36 21 0.000% 37 18 0.000% 38 16 0.000% 39 13 0.000% 40 13 0.000% 41 9 0.000% 42 6 0.000% 43 5 0.000% 44 5 0.000% 45 3 0.000% 46 3 0.000% 47 3 0.000% 48 2 0.000% 49 2 0.000% 50 2 0.000% 51 1 0.000% 205938071 Now if you split the key (64-bit ghash) from the value (32-bit TypeIndex), you would double the cache misses, right? The hashtable can quickly get pretty big, it is very unlikely that you would hit the same cache line in the L1d on the next record (and maybe L2). Could you please explain your algorithm more in detail? llvm-lib is llvm-ar in disguise ;-) From: Leonardo Santagada <santagada at gmail.com> Sent: Wednesday, February 27, 2019 6:17 PM To: Reid Kleckner <rnk at google.com> Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Zachary Turner <zturner at google.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Making LLD PDB generation faster My problem was that some library was still built with lto and I think that forces lld to do lto, but contrary to msvc it doesn't do any warning about it. I think we are going to try sharding the ghashes to multiple threads and have a hashmap that only contains the index to a list of seen types. This way we hope there's no need for any locking. Also we are investigating why we have 420 million types being linked while it appears that 95-99 % of them are not being used. De anyone know if pch can help here? My feeling is not much as template instantiation still generates a ton of weak symbols on the pch users, but I might be confused. Also another idea is to create lib files with types and hashes merged, where is llvm-lib source at? It seems funny but I don't seem to find it anywhere. Ps: the cmake bug for using llvm in visual studio projects has been fixed upstream. On Wed, 27 Feb 2019 at 20:10, Reid Kleckner <rnk at google.com<mailto:rnk at google.com>> wrote: This could be ICF. There were lots of issues with ICF on ARM64, but they are not inherently ARM64-specific, they just come up there more often. See https://reviews.llvm.org/D56986 which fixes that. Easiest thing is always to profile or add /time to see what's slow. On Wed, Feb 27, 2019 at 6:30 AM Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> wrote: Anyone would know why lld takes > 30 minutes to link lld without symbols on release? The command line seems simple enough: C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\lld.rsp /out:bin\lld.exe /implib:lib\lld.lib /version:0.0 /machine:x64 -fuse-ld=lld /STACK:10000000 /INCREMENTAL:NO /subsystem:console /MANIFEST /MANIFESTFILE:bin\lld.exe.manifest On Mon, Feb 25, 2019 at 8:20 PM Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> wrote:> > Sadly the patch on https://reviews.llvm.org/D55585 didn't apply on my > clone of llvm at all :( It will take me quite some time to test this > out. > > On Mon, Feb 25, 2019 at 5:08 PM Alexandre Ganea > <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> wrote: > > > > For enabling large memory pages, see this link: https://support.sisoftware.co.uk/knowledgebase.php?article=52 > > > > Meow hash isn't in the patch I posted, but you can use xxHash, it is good enough. Just add /hasher:xxhash to the LLD cmd-line. > > > > > > -----Original Message----- > > From: Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> > > Sent: Monday, February 25, 2019 11:05 AM > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> > > Cc: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>; Reid Kleckner <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > Times for lld compiled with LTO: > > > > Input File Reading: 1430 ms ( 3.3%) > > Code Layout: 486 ms ( 1.1%) > > PDB Emission (Cumulative): 41042 ms ( 94.6%) > > Add Objects: 33117 ms ( 76.4%) > > Type Merging: 25861 ms ( 59.6%) > > Symbol Merging: 7011 ms ( 16.2%) > > TPI Stream Layout: 996 ms ( 2.3%) > > Globals Stream Layout: 513 ms ( 1.2%) > > Commit to Disk: 5175 ms ( 11.9%) > > Commit Output File: 37 ms ( 0.1%) > > ------------------------------------------------- > > Total Link Time: 43366 ms (100.0%) > > > > LTO didn't help much :( > > > > Now I will try Alexandre patches and switch fo xxHash64 or meow hashing. I need to discover how to enable huge pages on my windows > > (1809) > > > > ps: Need to figure out how to limit the number of link jobs in ninja as that almost used the whole 128GB of ram on my machine. On our distributed build system we can limit linking jobs (which are the only strict local jobs) to 8. > > > > On Mon, Feb 25, 2019 at 4:47 PM Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> wrote: > > > > > > …however it is very slow to compile, because /MP isn’t currently supported by clang-cl. So each CPP is compiled sequentially, one after another. Thus my patch for adding /MP. > > > > > > > > > > > > From: Alexandre Ganea > > > Sent: Monday, February 25, 2019 10:42 AM > > > To: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>; Leonardo Santagada > > > <santagada at gmail.com<mailto:santagada at gmail.com>> > > > Cc: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > > Subject: RE: [llvm-dev] Making LLD PDB generation faster > > > > > > > > > > > > Yes, -Tllvm works. > > > > > > > > > > > > > > > > > > From: Zachary Turner <zturner at google.com<mailto:zturner at google.com>> > > > Sent: Monday, February 25, 2019 10:36 AM > > > To: Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>>; Reid Kleckner > > > <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > > > > > > > > > > Is -Tllvm even supported? I thought the only thing you could pass for > > > -T was -Thost=x64 > > > > > > On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> wrote: > > > > > > I think its a huge bug that it doesn't raise any errors or warnings > > > about it. But I will open a ticket on cmake, they should be using > > > clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 bit > > > as well. > > > > > > On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com<mailto:zturner at google.com>> wrote: > > > > > > > > I don’t think changing the compiler or linker is supported with the > > > > vs generator, but I also don’t think it’s a bug On Mon, Feb 25, 2019 at 6:31 AM Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> wrote: > > > >> > > > >> Can you please try using Ninja instead? > > > >> > > > >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release > > > >> -DLLVM_OPTIMIZED_TABLEGEN=true > > > >> -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld > > > >> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" > > > >> -DLLVM_ENABLE_PDB=true > > > >> > > > >> It will be faster to compile. The setup I use is the above Ninja cmd-line for compiling optimized builds; and in addition, I keep the Visual Studio generator, as you do, but only for having a .sln to debug. It is a bit annoying to cmake twice, in two different build folders, but you can write a batch script. > > > >> > > > >> If the above works, maybe you should log the bug on https://bugs.llvm.org/ so it is not forgotten. > > > >> > > > >> Alex. > > > >> > > > >> -----Original Message----- > > > >> From: Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> > > > >> Sent: Monday, February 25, 2019 9:04 AM > > > >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> > > > >> Cc: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>; Reid Kleckner > > > >> <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > > >> Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > >> > > > >> Ok so there's a lot of confusion on cmake regarding using llvm as a toolset. It still does all its checks against cl.exe (not clang-cl) and somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of places including: > > > >> > > > >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" > > > >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true > > > >> -DLLVM_ENABLE_PROJECTS=lld ../llvm > > > >> > > > >> but it seems like the generator overrides it. > > > >> > > > >> > > > >> ps: Created a phabricator account > > > >> > > > >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> wrote: > > > >> > > > > >> > That's good news. For having debug info, you could try adding /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the 'RelWithDebInfo' target instead of 'Release' and add -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default). > > > >> > > > > >> > Can you please send a patch on Phabricator if you fix the LLVM_ENABLE_PDB issue with Clang? The goal is to have performance out-of-the-box. > > > >> > > > > >> > Alex. > > > >> > > > > >> > -----Original Message----- > > > >> > From: Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> > > > >> > Sent: Monday, February 25, 2019 7:36 AM > > > >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> > > > >> > Cc: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>; Reid Kleckner > > > >> > <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > > >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > >> > > > > >> > With your patch for cmake and reconfiguring it with "cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true -DLLVM_ENABLE_PROJECTS=lld ../llvm" we get these results: > > > >> > > > > >> > Input File Reading: 1602 ms ( 3.5%) > > > >> > Code Layout: 493 ms ( 1.1%) > > > >> > PDB Emission (Cumulative): 43127 ms ( 94.5%) > > > >> > Add Objects: 34577 ms ( 75.8%) > > > >> > Type Merging: 26709 ms ( 58.5%) > > > >> > Symbol Merging: 7598 ms ( 16.7%) > > > >> > TPI Stream Layout: 1107 ms ( 2.4%) > > > >> > Globals Stream Layout: 602 ms ( 1.3%) > > > >> > Commit to Disk: 5636 ms ( 12.4%) > > > >> > Commit Output File: 16 ms ( 0.0%) > > > >> > ------------------------------------------------- > > > >> > Total Link Time: 45626 ms (100.0%) > > > >> > > > > >> > Unfortunately there were no pdb generated with lld.exe (or any > > > >> > other > > > >> > binaries) so I can't debug them. It seems like LLVM_ENABLE_PDB is not made to support using clang to complie itself as it tries to att /Zi to the targets instead of /Z7 and global hashes. I can patch it over here, but we probably want to fix this in cmake and on the docs, as its not clear at all how to compile lld in a performance 64bit way. > > > >> > > > > >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> wrote: > > > >> > > > > > >> > > How do you compile LLD? There's a big difference between when > > > >> > > using MSVC vs Clang. The parallel ghash patch I was mentioning > > > >> > > is almost 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I > > > >> > > don't know exactly why. I also suggest you use the Release target. You should also grab this patch: > > > >> > > https://reviews.llvm.org/D55056 - I had to revert it because it > > > >> > > was causing issues with LLDB. But it will give an improvement for LLD. > > > >> > > Please let me know if that improves your timings. > > > >> > > > > > >> > > The page faults are probably the OS loading from disk: most, if > > > >> > > not all the files are accessed by LLD by mmap'ing them. > > > >> > > > > > >> > > The lockless DenseHash I was talking about will be published in > > > >> > > an upcoming patch. As for reproducibility, this can be an issue > > > >> > > on build systems. But on local machines, we could explicitly > > > >> > > state that we want non-deterministic builds, through some cmd-line flag. If your 57sec for "Type Merging" > > > >> > > transforms into 5sec when non-deterministic, I think that's worth it. > > > >> > > > > > >> > > Alex. > > > >> > > > > > >> > > -----Original Message----- > > > >> > > From: Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> > > > >> > > Sent: Sunday, February 24, 2019 6:43 PM > > > >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> > > > >> > > Cc: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>; Reid Kleckner > > > >> > > <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > > >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > >> > > > > > >> > > More info inline, I think there is a couple of misconceptions on what I'm doing: > > > >> > > > > > >> > > 1) I already patch all my .obj files to contain .debug$H > > > >> > > entries so it is all ghashed already > > > >> > > 2) All the 35s is spent adding to the DenseMap > > > >> > > > > > >> > > Here is my current times (lld-link.exe compiled with -O2 so no lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file: > > > >> > > > > > >> > > Input File Reading: 1724 ms ( 2.1%) > > > >> > > Code Layout: 482 ms ( 0.6%) > > > >> > > PDB Emission (Cumulative): 79261 ms ( 96.8%) > > > >> > > Add Objects: 68650 ms ( 83.8%) > > > >> > > Type Merging: 57534 ms ( 70.2%) > > > >> > > Symbol Merging: 10822 ms ( 13.2%) > > > >> > > TPI Stream Layout: 1501 ms ( 1.8%) > > > >> > > Globals Stream Layout: 770 ms ( 0.9%) > > > >> > > Commit to Disk: 7007 ms ( 8.6%) > > > >> > > Commit Output File: 19 ms ( 0.0%) > > > >> > > ------------------------------------------------- > > > >> > > Total Link Time: 81900 ms (100.0%) > > > >> > > > > > >> > > Our target is for < 20 seconds linking, anything bellow 40 seconds would be ok. Ideal times would be around 8s (in which it will mostly beat link.exe incremental linking). > > > >> > > > > > >> > > My tip for profiling is using superluminal > > > >> > > (https://www.superluminal.eu/) its the easiest way to see everything your code is doing. > > > >> > > > > > >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>> wrote: > > > >> > > > > > > >> > > > Leonardo, to answer to your questions, yes to all of them J > > > >> > > > You can take a > > > >> > > > > > > >> > > > look at this prototype/proposal: > > > >> > > > https://reviews.llvm.org/D55585 > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > Overall, computing ghashes in parallel at link-time and > > > >> > > > merging Types with them > > > >> > > > > > > >> > > > is less costly that the current approach to merging. The > > > >> > > > 35sec you’re seeing > > > >> > > > > > > >> > > > for merging should go down to about 15sec. > > > >> > > > > > >> > > I don't do much computing of ghashes as we already preprocess all .obj files from msvc to add a .debug$H to them. The whole 35 seconds is spent in just densehash findbucket function. The rest of the time is mostly pagefaults (I guess to load in obj data and to grow the final pdb?). > > > >> > > > > > >> > > > The patch doesn’t parallelize > > > >> > > > > > > >> > > > (yet) the Type merging itself, but we have an alternate > > > >> > > > multithread-suitable > > > >> > > > > > > >> > > > implementation of DenseHash which already supports lockless, > > > >> > > > wait-free, > > > >> > > > > > > >> > > > insert/fetch/resize. > > > >> > > > > > >> > > Where is this lockless densehash? This is the part were I would love to help, but if there is a densehash it is probably just creating the threads and letting them merge the results. I'm a bit afraid of reproduceability of builds, but as we already don't have that with link.exe we are not really loosing anything. > > > >> > > > > > >> > > > > > > >> > > > > > > >> > > > The prototype allows for testing different hashing > > > >> > > > algorithms, and indeed > > > >> > > > > > > >> > > > xxHash seems to be the best general-purpose choice. I’ve also > > > >> > > > added support > > > >> > > > > > > >> > > > for more specialized hardware-based hashes, like Casey > > > >> > > > Muratori’s Meow Hash > > > >> > > > > > > >> > > > (uses hardware AES SSE 4.2 instructions), which brings the figures down a bit. > > > >> > > > > > > >> > > > > > >> > > I remembered Meow hashes needing at least k bytes of data, but looking at their website right now there is no mention of it. Hashing performance isn't much of an impact as we do it per .obj file distributed through our company so the time to calculate those are completely distributed. > > > >> > > > > > >> > > > > > > >> > > > > > > >> > > > Future changes could write back the computed ghash stream > > > >> > > > back to OBJs if > > > >> > > > > > > >> > > > /INCREMENTAL is specified (just an idea). Incrementally > > > >> > > > linking will be faster > > > >> > > > > > > >> > > > that way when working with MSVC OBJs. > > > >> > > > > > > >> > > > > > >> > > I already have a patch for llvm-objcopy that adds a > > > >> > > -add-ghashes option that does this, I will be cleaning it up > > > >> > > this week and sending a PR for it > > > >> > > > > > >> > > > > > > >> > > > > > > >> > > > As for creating PDBs for independent projects, that would help most likely. > > > >> > > > > > > >> > > > However the ghash stream would need to be stored in the PDB > > > >> > > > in that case > > > >> > > > > > > >> > > > (currently, ghashes are dropped after merging). That could > > > >> > > > help when using > > > >> > > > > > > >> > > > rarely compiled projects, used along with network caches. > > > >> > > > > > >> > > I meant actually a .lib, with all the obj files inside plus a merged .debug$H entry. No pdb generation or changes necessary, we just run the same code that merges types in lld and do that a the librarian level. > > > >> > > > > > >> > > > > > > >> > > > I will start sending smaller patches to converge towards the > > > >> > > > functionally of > > > >> > > > > > > >> > > > the prototype above. > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > Best, > > > >> > > > > > > >> > > > Alex. > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > From: Zachary Turner <zturner at google.com<mailto:zturner at google.com>> > > > >> > > > Sent: Sunday, February 24, 2019 1:20 AM > > > >> > > > To: Leonardo Santagada <santagada at gmail.com<mailto:santagada at gmail.com>> > > > >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com<mailto:alexandre.ganea at ubisoft.com>>; Reid > > > >> > > > Kleckner <rnk at google.com<mailto:rnk at google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> > > > >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > +Reid and Alexandre, who have been doing work in this area > > > >> > > > +recently > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: > > > >> > > > > > > >> > > > Hi, > > > >> > > > > > > >> > > > Is anyone working on making the PDB generation on LLD faster? > > > >> > > > Looking of a trace for linking one of our binaries (it takes > > > >> > > > 1min6s-1min20s) I see two things: > > > >> > > > > > > >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so > > > >> > > > almost half of the time of linking, mostly finding duplicates > > > >> > > > 2) There is no parallelization inside of addObjectsToPDB > > > >> > > > > > > >> > > > Is anyone working on those? Also has anyone thought about > > > >> > > > merging .obj files to deduplicate type infomation so we can > > > >> > > > do the linking on projects to generate something like a lib > > > >> > > > file, but deduplicated debug information (as far as I know > > > >> > > > actual .lib just put all pdbs or > > > >> > > > /Z7 debug info inside a file without dedup). > > > >> > > > > > > >> > > > Just looking at the code it seems it is much more mature and > > > >> > > > also the choice of SHA1_8 seems interesting (still don't know > > > >> > > > why not use xxHash64). > > > >> > > > > > > >> > > > ps: My code to add ghashes to msvc compiled .obj files is > > > >> > > > almost ready to be pushed as an option for llvm-objcopy. > > > >> > > > > > > >> > > > -- > > > >> > > > > > > >> > > > Leonardo Santagada > > > >> > > > _______________________________________________ > > > >> > > > LLVM Developers mailing list > > > >> > > > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > > > >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > >> > > > > > >> > > > > > >> > > > > > >> > > -- > > > >> > > > > > >> > > Leonardo Santagada > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > > > > >> > Leonardo Santagada > > > >> > > > >> > > > >> > > > >> -- > > > >> > > > >> Leonardo Santagada > > > > > > > > > > > > -- > > > > > > Leonardo Santagada > > > > > > > > -- > > > > Leonardo Santagada > > > > -- > > Leonardo Santagada-- Leonardo Santagada -- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190228/7e55f483/attachment-0001.html>
David Blaikie via llvm-dev
2019-Feb-28 19:35 UTC
[llvm-dev] Making LLD PDB generation faster
On Wed, Feb 27, 2019 at 3:17 PM Leonardo Santagada via llvm-dev < llvm-dev at lists.llvm.org> wrote:> My problem was that some library was still built with lto and I think that > forces lld to do lto, but contrary to msvc it doesn't do any warning about > it. > > I think we are going to try sharding the ghashes to multiple threads and > have a hashmap that only contains the index to a list of seen types. This > way we hope there's no need for any locking. > > Also we are investigating why we have 420 million types being linked while > it appears that 95-99 % of them are not being used. De anyone know if pch > can help here? My feeling is not much as template instantiation still > generates a ton of weak symbols on the pch users, but I might be confused. >PCH probably doesn't help by itself - modular code generation ( https://www.youtube.com/watch?v=lYYxDXgbUZ0 ) can remove a lot of the duplication, but requires teaching your build system new tricks (& modularizing the code itself). Also, I forget if Windows is using -fstandalone-debug by default, I think it did for a while (& I forget why). If there was a way to avoid that, it'd also reduce duplication.> > Also another idea is to create lib files with types and hashes merged, > where is llvm-lib source at? It seems funny but I don't seem to find it > anywhere. > > Ps: the cmake bug for using llvm in visual studio projects has been fixed > upstream. > > On Wed, 27 Feb 2019 at 20:10, Reid Kleckner <rnk at google.com> wrote: > >> This could be ICF. There were lots of issues with ICF on ARM64, but they >> are not inherently ARM64-specific, they just come up there more often. See >> https://reviews.llvm.org/D56986 which fixes that. >> >> Easiest thing is always to profile or add /time to see what's slow. >> >> On Wed, Feb 27, 2019 at 6:30 AM Leonardo Santagada <santagada at gmail.com> >> wrote: >> >>> Anyone would know why lld takes > 30 minutes to link lld without >>> symbols on release? >>> >>> The command line seems simple enough: >>> >>> C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\lld.rsp >>> /out:bin\lld.exe /implib:lib\lld.lib /version:0.0 /machine:x64 >>> -fuse-ld=lld /STACK:10000000 /INCREMENTAL:NO /subsystem:console >>> /MANIFEST /MANIFESTFILE:bin\lld.exe.manifest >>> >>> On Mon, Feb 25, 2019 at 8:20 PM Leonardo Santagada <santagada at gmail.com> >>> wrote: >>> > >>> > Sadly the patch on https://reviews.llvm.org/D55585 didn't apply on my >>> > clone of llvm at all :( It will take me quite some time to test this >>> > out. >>> > >>> > On Mon, Feb 25, 2019 at 5:08 PM Alexandre Ganea >>> > <alexandre.ganea at ubisoft.com> wrote: >>> > > >>> > > For enabling large memory pages, see this link: >>> https://support.sisoftware.co.uk/knowledgebase.php?article=52 >>> > > >>> > > Meow hash isn't in the patch I posted, but you can use xxHash, it is >>> good enough. Just add /hasher:xxhash to the LLD cmd-line. >>> > > >>> > > >>> > > -----Original Message----- >>> > > From: Leonardo Santagada <santagada at gmail.com> >>> > > Sent: Monday, February 25, 2019 11:05 AM >>> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >>> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner < >>> rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >>> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >>> > > >>> > > Times for lld compiled with LTO: >>> > > >>> > > Input File Reading: 1430 ms ( 3.3%) >>> > > Code Layout: 486 ms ( 1.1%) >>> > > PDB Emission (Cumulative): 41042 ms ( 94.6%) >>> > > Add Objects: 33117 ms ( 76.4%) >>> > > Type Merging: 25861 ms ( 59.6%) >>> > > Symbol Merging: 7011 ms ( 16.2%) >>> > > TPI Stream Layout: 996 ms ( 2.3%) >>> > > Globals Stream Layout: 513 ms ( 1.2%) >>> > > Commit to Disk: 5175 ms ( 11.9%) >>> > > Commit Output File: 37 ms ( 0.1%) >>> > > ------------------------------------------------- >>> > > Total Link Time: 43366 ms (100.0%) >>> > > >>> > > LTO didn't help much :( >>> > > >>> > > Now I will try Alexandre patches and switch fo xxHash64 or meow >>> hashing. I need to discover how to enable huge pages on my windows >>> > > (1809) >>> > > >>> > > ps: Need to figure out how to limit the number of link jobs in ninja >>> as that almost used the whole 128GB of ram on my machine. On our >>> distributed build system we can limit linking jobs (which are the only >>> strict local jobs) to 8. >>> > > >>> > > On Mon, Feb 25, 2019 at 4:47 PM Alexandre Ganea < >>> alexandre.ganea at ubisoft.com> wrote: >>> > > > >>> > > > …however it is very slow to compile, because /MP isn’t currently >>> supported by clang-cl. So each CPP is compiled sequentially, one after >>> another. Thus my patch for adding /MP. >>> > > > >>> > > > >>> > > > >>> > > > From: Alexandre Ganea >>> > > > Sent: Monday, February 25, 2019 10:42 AM >>> > > > To: Zachary Turner <zturner at google.com>; Leonardo Santagada >>> > > > <santagada at gmail.com> >>> > > > Cc: Reid Kleckner <rnk at google.com>; llvm-dev < >>> llvm-dev at lists.llvm.org> >>> > > > Subject: RE: [llvm-dev] Making LLD PDB generation faster >>> > > > >>> > > > >>> > > > >>> > > > Yes, -Tllvm works. >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > From: Zachary Turner <zturner at google.com> >>> > > > Sent: Monday, February 25, 2019 10:36 AM >>> > > > To: Leonardo Santagada <santagada at gmail.com> >>> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner >>> > > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >>> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >>> > > > >>> > > > >>> > > > >>> > > > Is -Tllvm even supported? I thought the only thing you could pass >>> for >>> > > > -T was -Thost=x64 >>> > > > >>> > > > On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada < >>> santagada at gmail.com> wrote: >>> > > > >>> > > > I think its a huge bug that it doesn't raise any errors or warnings >>> > > > about it. But I will open a ticket on cmake, they should be using >>> > > > clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 >>> bit >>> > > > as well. >>> > > > >>> > > > On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com> >>> wrote: >>> > > > > >>> > > > > I don’t think changing the compiler or linker is supported with >>> the >>> > > > > vs generator, but I also don’t think it’s a bug On Mon, Feb 25, >>> 2019 at 6:31 AM Alexandre Ganea <alexandre.ganea at ubisoft.com> wrote: >>> > > > >> >>> > > > >> Can you please try using Ninja instead? >>> > > > >> >>> > > > >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release >>> > > > >> -DLLVM_OPTIMIZED_TABLEGEN=true >>> > > > >> -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld >>> > > > >> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true >>> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >>> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >>> > > > >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" >>> > > > >> -DLLVM_ENABLE_PDB=true >>> > > > >> >>> > > > >> It will be faster to compile. The setup I use is the above >>> Ninja cmd-line for compiling optimized builds; and in addition, I keep the >>> Visual Studio generator, as you do, but only for having a .sln to debug. It >>> is a bit annoying to cmake twice, in two different build folders, but you >>> can write a batch script. >>> > > > >> >>> > > > >> If the above works, maybe you should log the bug on >>> https://bugs.llvm.org/ so it is not forgotten. >>> > > > >> >>> > > > >> Alex. >>> > > > >> >>> > > > >> -----Original Message----- >>> > > > >> From: Leonardo Santagada <santagada at gmail.com> >>> > > > >> Sent: Monday, February 25, 2019 9:04 AM >>> > > > >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >>> > > > >> Cc: Zachary Turner <zturner at google.com>; Reid Kleckner >>> > > > >> <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >>> > > > >> Subject: Re: [llvm-dev] Making LLD PDB generation faster >>> > > > >> >>> > > > >> Ok so there's a lot of confusion on cmake regarding using llvm >>> as a toolset. It still does all its checks against cl.exe (not clang-cl) >>> and somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of >>> places including: >>> > > > >> >>> > > > >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 >>> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" >>> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >>> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" >>> > > > >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true >>> > > > >> -DLLVM_ENABLE_PROJECTS=lld ../llvm >>> > > > >> >>> > > > >> but it seems like the generator overrides it. >>> > > > >> >>> > > > >> >>> > > > >> ps: Created a phabricator account >>> > > > >> >>> > > > >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea < >>> alexandre.ganea at ubisoft.com> wrote: >>> > > > >> > >>> > > > >> > That's good news. For having debug info, you could try adding >>> /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the >>> 'RelWithDebInfo' target instead of 'Release' and add >>> -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default). >>> > > > >> > >>> > > > >> > Can you please send a patch on Phabricator if you fix the >>> LLVM_ENABLE_PDB issue with Clang? The goal is to have performance >>> out-of-the-box. >>> > > > >> > >>> > > > >> > Alex. >>> > > > >> > >>> > > > >> > -----Original Message----- >>> > > > >> > From: Leonardo Santagada <santagada at gmail.com> >>> > > > >> > Sent: Monday, February 25, 2019 7:36 AM >>> > > > >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >>> > > > >> > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner >>> > > > >> > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >>> > > > >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster >>> > > > >> > >>> > > > >> > With your patch for cmake and reconfiguring it with "cmake -G >>> "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true >>> -DLLVM_ENABLE_PROJECTS=lld ../llvm" we get these results: >>> > > > >> > >>> > > > >> > Input File Reading: 1602 ms ( 3.5%) >>> > > > >> > Code Layout: 493 ms ( 1.1%) >>> > > > >> > PDB Emission (Cumulative): 43127 ms ( 94.5%) >>> > > > >> > Add Objects: 34577 ms ( 75.8%) >>> > > > >> > Type Merging: 26709 ms ( 58.5%) >>> > > > >> > Symbol Merging: 7598 ms ( 16.7%) >>> > > > >> > TPI Stream Layout: 1107 ms ( 2.4%) >>> > > > >> > Globals Stream Layout: 602 ms ( 1.3%) >>> > > > >> > Commit to Disk: 5636 ms ( 12.4%) >>> > > > >> > Commit Output File: 16 ms ( 0.0%) >>> > > > >> > ------------------------------------------------- >>> > > > >> > Total Link Time: 45626 ms (100.0%) >>> > > > >> > >>> > > > >> > Unfortunately there were no pdb generated with lld.exe (or any >>> > > > >> > other >>> > > > >> > binaries) so I can't debug them. It seems like >>> LLVM_ENABLE_PDB is not made to support using clang to complie itself as it >>> tries to att /Zi to the targets instead of /Z7 and global hashes. I can >>> patch it over here, but we probably want to fix this in cmake and on the >>> docs, as its not clear at all how to compile lld in a performance 64bit way. >>> > > > >> > >>> > > > >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea < >>> alexandre.ganea at ubisoft.com> wrote: >>> > > > >> > > >>> > > > >> > > How do you compile LLD? There's a big difference between >>> when >>> > > > >> > > using MSVC vs Clang. The parallel ghash patch I was >>> mentioning >>> > > > >> > > is almost 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I >>> > > > >> > > don't know exactly why. I also suggest you use the Release >>> target. You should also grab this patch: >>> > > > >> > > https://reviews.llvm.org/D55056 - I had to revert it >>> because it >>> > > > >> > > was causing issues with LLDB. But it will give an >>> improvement for LLD. >>> > > > >> > > Please let me know if that improves your timings. >>> > > > >> > > >>> > > > >> > > The page faults are probably the OS loading from disk: >>> most, if >>> > > > >> > > not all the files are accessed by LLD by mmap'ing them. >>> > > > >> > > >>> > > > >> > > The lockless DenseHash I was talking about will be >>> published in >>> > > > >> > > an upcoming patch. As for reproducibility, this can be an >>> issue >>> > > > >> > > on build systems. But on local machines, we could explicitly >>> > > > >> > > state that we want non-deterministic builds, through some >>> cmd-line flag. If your 57sec for "Type Merging" >>> > > > >> > > transforms into 5sec when non-deterministic, I think that's >>> worth it. >>> > > > >> > > >>> > > > >> > > Alex. >>> > > > >> > > >>> > > > >> > > -----Original Message----- >>> > > > >> > > From: Leonardo Santagada <santagada at gmail.com> >>> > > > >> > > Sent: Sunday, February 24, 2019 6:43 PM >>> > > > >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com> >>> > > > >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner >>> > > > >> > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> >>> > > > >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >>> > > > >> > > >>> > > > >> > > More info inline, I think there is a couple of >>> misconceptions on what I'm doing: >>> > > > >> > > >>> > > > >> > > 1) I already patch all my .obj files to contain .debug$H >>> > > > >> > > entries so it is all ghashed already >>> > > > >> > > 2) All the 35s is spent adding to the DenseMap >>> > > > >> > > >>> > > > >> > > Here is my current times (lld-link.exe compiled with -O2 so >>> no lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file: >>> > > > >> > > >>> > > > >> > > Input File Reading: 1724 ms ( 2.1%) >>> > > > >> > > Code Layout: 482 ms ( 0.6%) >>> > > > >> > > PDB Emission (Cumulative): 79261 ms ( 96.8%) >>> > > > >> > > Add Objects: 68650 ms ( 83.8%) >>> > > > >> > > Type Merging: 57534 ms ( 70.2%) >>> > > > >> > > Symbol Merging: 10822 ms ( 13.2%) >>> > > > >> > > TPI Stream Layout: 1501 ms ( 1.8%) >>> > > > >> > > Globals Stream Layout: 770 ms ( 0.9%) >>> > > > >> > > Commit to Disk: 7007 ms ( 8.6%) >>> > > > >> > > Commit Output File: 19 ms ( 0.0%) >>> > > > >> > > ------------------------------------------------- >>> > > > >> > > Total Link Time: 81900 ms (100.0%) >>> > > > >> > > >>> > > > >> > > Our target is for < 20 seconds linking, anything bellow 40 >>> seconds would be ok. Ideal times would be around 8s (in which it will >>> mostly beat link.exe incremental linking). >>> > > > >> > > >>> > > > >> > > My tip for profiling is using superluminal >>> > > > >> > > (https://www.superluminal.eu/) its the easiest way to see >>> everything your code is doing. >>> > > > >> > > >>> > > > >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea < >>> alexandre.ganea at ubisoft.com> wrote: >>> > > > >> > > > >>> > > > >> > > > Leonardo, to answer to your questions, yes to all of them >>> J >>> > > > >> > > > You can take a >>> > > > >> > > > >>> > > > >> > > > look at this prototype/proposal: >>> > > > >> > > > https://reviews.llvm.org/D55585 >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > Overall, computing ghashes in parallel at link-time and >>> > > > >> > > > merging Types with them >>> > > > >> > > > >>> > > > >> > > > is less costly that the current approach to merging. The >>> > > > >> > > > 35sec you’re seeing >>> > > > >> > > > >>> > > > >> > > > for merging should go down to about 15sec. >>> > > > >> > > >>> > > > >> > > I don't do much computing of ghashes as we already >>> preprocess all .obj files from msvc to add a .debug$H to them. The whole 35 >>> seconds is spent in just densehash findbucket function. The rest of the >>> time is mostly pagefaults (I guess to load in obj data and to grow the >>> final pdb?). >>> > > > >> > > >>> > > > >> > > > The patch doesn’t parallelize >>> > > > >> > > > >>> > > > >> > > > (yet) the Type merging itself, but we have an alternate >>> > > > >> > > > multithread-suitable >>> > > > >> > > > >>> > > > >> > > > implementation of DenseHash which already supports >>> lockless, >>> > > > >> > > > wait-free, >>> > > > >> > > > >>> > > > >> > > > insert/fetch/resize. >>> > > > >> > > >>> > > > >> > > Where is this lockless densehash? This is the part were I >>> would love to help, but if there is a densehash it is probably just >>> creating the threads and letting them merge the results. I'm a bit afraid >>> of reproduceability of builds, but as we already don't have that with >>> link.exe we are not really loosing anything. >>> > > > >> > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > The prototype allows for testing different hashing >>> > > > >> > > > algorithms, and indeed >>> > > > >> > > > >>> > > > >> > > > xxHash seems to be the best general-purpose choice. I’ve >>> also >>> > > > >> > > > added support >>> > > > >> > > > >>> > > > >> > > > for more specialized hardware-based hashes, like Casey >>> > > > >> > > > Muratori’s Meow Hash >>> > > > >> > > > >>> > > > >> > > > (uses hardware AES SSE 4.2 instructions), which brings >>> the figures down a bit. >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > > I remembered Meow hashes needing at least k bytes of data, >>> but looking at their website right now there is no mention of it. Hashing >>> performance isn't much of an impact as we do it per .obj file distributed >>> through our company so the time to calculate those are completely >>> distributed. >>> > > > >> > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > Future changes could write back the computed ghash stream >>> > > > >> > > > back to OBJs if >>> > > > >> > > > >>> > > > >> > > > /INCREMENTAL is specified (just an idea). Incrementally >>> > > > >> > > > linking will be faster >>> > > > >> > > > >>> > > > >> > > > that way when working with MSVC OBJs. >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > > I already have a patch for llvm-objcopy that adds a >>> > > > >> > > -add-ghashes option that does this, I will be cleaning it up >>> > > > >> > > this week and sending a PR for it >>> > > > >> > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > As for creating PDBs for independent projects, that would >>> help most likely. >>> > > > >> > > > >>> > > > >> > > > However the ghash stream would need to be stored in the >>> PDB >>> > > > >> > > > in that case >>> > > > >> > > > >>> > > > >> > > > (currently, ghashes are dropped after merging). That could >>> > > > >> > > > help when using >>> > > > >> > > > >>> > > > >> > > > rarely compiled projects, used along with network caches. >>> > > > >> > > >>> > > > >> > > I meant actually a .lib, with all the obj files inside plus >>> a merged .debug$H entry. No pdb generation or changes necessary, we just >>> run the same code that merges types in lld and do that a the librarian >>> level. >>> > > > >> > > >>> > > > >> > > > >>> > > > >> > > > I will start sending smaller patches to converge towards >>> the >>> > > > >> > > > functionally of >>> > > > >> > > > >>> > > > >> > > > the prototype above. >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > Best, >>> > > > >> > > > >>> > > > >> > > > Alex. >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > From: Zachary Turner <zturner at google.com> >>> > > > >> > > > Sent: Sunday, February 24, 2019 1:20 AM >>> > > > >> > > > To: Leonardo Santagada <santagada at gmail.com> >>> > > > >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid >>> > > > >> > > > Kleckner <rnk at google.com>; llvm-dev < >>> llvm-dev at lists.llvm.org> >>> > > > >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > +Reid and Alexandre, who have been doing work in this area >>> > > > >> > > > +recently >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via >>> llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> > > > >> > > > >>> > > > >> > > > Hi, >>> > > > >> > > > >>> > > > >> > > > Is anyone working on making the PDB generation on LLD >>> faster? >>> > > > >> > > > Looking of a trace for linking one of our binaries (it >>> takes >>> > > > >> > > > 1min6s-1min20s) I see two things: >>> > > > >> > > > >>> > > > >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so >>> > > > >> > > > almost half of the time of linking, mostly finding >>> duplicates >>> > > > >> > > > 2) There is no parallelization inside of addObjectsToPDB >>> > > > >> > > > >>> > > > >> > > > Is anyone working on those? Also has anyone thought about >>> > > > >> > > > merging .obj files to deduplicate type infomation so we >>> can >>> > > > >> > > > do the linking on projects to generate something like a >>> lib >>> > > > >> > > > file, but deduplicated debug information (as far as I know >>> > > > >> > > > actual .lib just put all pdbs or >>> > > > >> > > > /Z7 debug info inside a file without dedup). >>> > > > >> > > > >>> > > > >> > > > Just looking at the code it seems it is much more mature >>> and >>> > > > >> > > > also the choice of SHA1_8 seems interesting (still don't >>> know >>> > > > >> > > > why not use xxHash64). >>> > > > >> > > > >>> > > > >> > > > ps: My code to add ghashes to msvc compiled .obj files is >>> > > > >> > > > almost ready to be pushed as an option for llvm-objcopy. >>> > > > >> > > > >>> > > > >> > > > -- >>> > > > >> > > > >>> > > > >> > > > Leonardo Santagada >>> > > > >> > > > _______________________________________________ >>> > > > >> > > > LLVM Developers mailing list >>> > > > >> > > > llvm-dev at lists.llvm.org >>> > > > >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> > > > >> > > >>> > > > >> > > >>> > > > >> > > >>> > > > >> > > -- >>> > > > >> > > >>> > > > >> > > Leonardo Santagada >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> > -- >>> > > > >> > >>> > > > >> > Leonardo Santagada >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> -- >>> > > > >> >>> > > > >> Leonardo Santagada >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > >>> > > > Leonardo Santagada >>> > > >>> > > >>> > > >>> > > -- >>> > > >>> > > Leonardo Santagada >>> > >>> > >>> > >>> > -- >>> > >>> > Leonardo Santagada >>> >>> >>> >>> -- >>> >>> Leonardo Santagada >>> >> -- > > Leonardo Santagada > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190228/8c5506ea/attachment-0001.html>