Leonardo Santagada via llvm-dev
2019-Feb-23 12:06 UTC
[llvm-dev] Making LLD PDB generation faster
Hi, Is anyone working on making the PDB generation on LLD faster? Looking of a trace for linking one of our binaries (it takes 1min6s-1min20s) I see two things: 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so almost half of the time of linking, mostly finding duplicates 2) There is no parallelization inside of addObjectsToPDB Is anyone working on those? Also has anyone thought about merging .obj files to deduplicate type infomation so we can do the linking on projects to generate something like a lib file, but deduplicated debug information (as far as I know actual .lib just put all pdbs or /Z7 debug info inside a file without dedup). Just looking at the code it seems it is much more mature and also the choice of SHA1_8 seems interesting (still don't know why not use xxHash64). ps: My code to add ghashes to msvc compiled .obj files is almost ready to be pushed as an option for llvm-objcopy. -- Leonardo Santagada
Zachary Turner via llvm-dev
2019-Feb-24 06:19 UTC
[llvm-dev] Making LLD PDB generation faster
+Reid and Alexandre, who have been doing work in this area recently On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > Is anyone working on making the PDB generation on LLD faster? Looking > of a trace for linking one of our binaries (it takes 1min6s-1min20s) I > see two things: > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so almost half of > the time of linking, mostly finding duplicates > 2) There is no parallelization inside of addObjectsToPDB > > Is anyone working on those? Also has anyone thought about merging .obj > files to deduplicate type infomation so we can do the linking on > projects to generate something like a lib file, but deduplicated debug > information (as far as I know actual .lib just put all pdbs or /Z7 > debug info inside a file without dedup). > > Just looking at the code it seems it is much more mature and also the > choice of SHA1_8 seems interesting (still don't know why not use > xxHash64). > > ps: My code to add ghashes to msvc compiled .obj files is almost ready > to be pushed as an option for llvm-objcopy. > > -- > > Leonardo Santagada > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20190223/8fad226e/attachment.html>
Alexandre Ganea via llvm-dev
2019-Feb-24 16:18 UTC
[llvm-dev] Making LLD PDB generation faster
Leonardo, to answer to your questions, yes to all of them You can take a look at this prototype/proposal: reviews.llvm.org/D55585 Overall, computing ghashes in parallel at link-time and merging Types with them is less costly that the current approach to merging. The 35sec you’re seeing for merging should go down to about 15sec. The patch doesn’t parallelize (yet) the Type merging itself, but we have an alternate multithread-suitable implementation of DenseHash which already supports lockless, wait-free, insert/fetch/resize. The prototype allows for testing different hashing algorithms, and indeed xxHash seems to be the best general-purpose choice. I’ve also added support for more specialized hardware-based hashes, like Casey Muratori’s Meow Hash (uses hardware AES SSE 4.2 instructions), which brings the figures down a bit. Future changes could write back the computed ghash stream back to OBJs if /INCREMENTAL is specified (just an idea). Incrementally linking will be faster that way when working with MSVC OBJs. As for creating PDBs for independent projects, that would help most likely. However the ghash stream would need to be stored in the PDB in that case (currently, ghashes are dropped after merging). That could help when using rarely compiled projects, used along with network caches. I will start sending smaller patches to converge towards the functionally of the prototype above. Best, Alex. From: Zachary Turner <zturner at google.com> Sent: Sunday, February 24, 2019 1:20 AM To: Leonardo Santagada <santagada at gmail.com> Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Making LLD PDB generation faster +Reid and Alexandre, who have been doing work in this area recently On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, Is anyone working on making the PDB generation on LLD faster? Looking of a trace for linking one of our binaries (it takes 1min6s-1min20s) I see two things: 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so almost half of the time of linking, mostly finding duplicates 2) There is no parallelization inside of addObjectsToPDB Is anyone working on those? Also has anyone thought about merging .obj files to deduplicate type infomation so we can do the linking on projects to generate something like a lib file, but deduplicated debug information (as far as I know actual .lib just put all pdbs or /Z7 debug info inside a file without dedup). Just looking at the code it seems it is much more mature and also the choice of SHA1_8 seems interesting (still don't know why not use xxHash64). ps: My code to add ghashes to msvc compiled .obj files is almost ready to be pushed as an option for llvm-objcopy. -- Leonardo Santagada _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20190224/eed36f19/attachment.html>