> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com> wrote: > > On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at apple.com> wrote: >>> It also doesn't solve the problem of Functions themselves -- those are >>> also GlobalValues… >> >> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something. > > Function passes can remove, duplicate, or just plain introduce call > sites (e.g. recognize a memset pattern), which means the use lists of > Functions can be changed during a function pass…Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think. One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently. Cheers, Florian
On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote:> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.Can they? I have had to make a pass a ModulePass in the past, because it added a global to hold a cache. The global was used in only the function being modified, but the fact that it was a global prevented the pass from being a FunctionPass. David
> On Mar 26, 2020, at 11:09, David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote: >> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently. > > Can they? I have had to make a pass a ModulePass in the past, because it added a global to hold a cache. The global was used in only the function being modified, but the fact that it was a global prevented the pass from being a FunctionPass.Maybe they should not, but I think in practice plenty of function passes insert new declarations. For example, I think Intrinsic::getDeclaration inserts a new declaration if the requested intrinsic is not yet declared (https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Function.cpp#L1117) and a lot of function passes use it. Cheers, Florian
On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <florian_hahn at apple.com> wrote:> > On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com> wrote: > > On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at apple.com> wrote: > >>> It also doesn't solve the problem of Functions themselves -- those are > >>> also GlobalValues… > >> > >> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something. > > > > Function passes can remove, duplicate, or just plain introduce call > > sites (e.g. recognize a memset pattern), which means the use lists of > > Functions can be changed during a function pass… > > > Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think.Oh, I see what you're saying now. Yes, agreed that that should work. Though by the time you're done implementing the conversion from and to that representation and made sure you've covered all the corner cases, I'm not so sure you've really saved a lot of effort relative to just doing it right in the first place :)> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.Agreed. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
Hello everyone,
Just to add a bit of spice to the discussion about “Multi-Threading Compilers”:
(sorry for just bringing high-level ideas)
We are heavy users of unity files (aka blobs or jumbos).
Unity files are a big pain, they add extra complexity, but at the same time they
provide tremendous build-time reductions, 10x or more. Our game projects
typically read >50,000 files during the full build of a single target, out of
which 20,000 .CPPs. The same unity target compiles only 600 unity .CPPs, which
themselves aggregate all of the 20,000 initial .CPPs. Building locally the
20,000 TUs on a modern 3.7 GHz 6-core PC takes more than 2 hours 30 min. With
unity files, it takes 20 minutes. Distributing it remotely on pool of machines
takes 5 min. Caching everything and rebuilding takes 45 sec.
However we’re now tributary of the order of files in the unities. If files or
folders are added or removed in the codebase, the contents of the unity can
change, thus the cache is invalidated for that unity CPP. And that happens quite
often in production.
Unities also induce higher build times in some cases, spikes, like I was showing
in a previous post of this thread. Without inspecting the AST, it is hard to
determine an optimal “cutting” point when building the unity .CPPs. We can end
up with unities including template-heavy .CPPs which will take a lot longer than
other Unity files.
If we are to discuss multi-threading, this means we are discussing compile-time
performance and how compilation would scale in the future. I think we should
consider the functionality of unity files in the compiler (maybe behind a flag
if it’s non-conformant).
While I don't know exactly how that fits in this (multi-treading)
discussion, efficiently coalescing compilation of several TUs should be the
compiler's responsibility, and likely will be more efficient than doing it
by a pre-build tool, like we do today.
In essence, if we were to provide a large number of files to Clang, let's
say with the same options: (the /MP flag is still WIP, I'll get back to that
soon, [1])
clang-cl /c a.cpp b.cpp c.cpp d.cpp ... /MP
And then expect the compiler to (somehow) share
tokenization-lexing-filecaching-preprocessing-compilation-optims-computations-etc
across TUs, in a lock-free manner preferably. Overlapped/duplicated computations
across threads, in the manner of transactions, would be probably fine, if
computations are small and if we want to avoid locks (but this needs to be
profiled). Also the recent trend of NUMA processor “tiles” as well as HBM2
memory on-chip per “tile”, could change the way multi-threaded code is written.
Perhaps states would need to be duplicated in the local NUMA memory for maximum
performance. Additionally, I’m not sure (how/if) lock-based programming will
scale past a few hundreds, or thousands of cores in a single image without major
contention. Maybe, as long as locks don’t cross NUMA boundaries. This needs to
be considered in the design.
So while the discussion seems to around multi-threading single TUs, it’d be nice
to also consider the possibility of sharing state between TUs. Which maybe means
retaining runtime state in global hash table(s). And possibly persisting that
state on disk, or in a DB, after the compilation -- we could maybe draw a
parallel with work done by SN Systems (Program Repo, see [2]), or zapcc [3].
Thanks!
Alex.
[1] https://reviews.llvm.org/D52193
[2] https://github.com/SNSystems/llvm-project-prepo
[3] https://github.com/yrnkrn/zapcc
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200326/11917793/attachment.html>
On 3/26/20 11:56 AM, Nicolai Hähnle via llvm-dev wrote:> On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <florian_hahn at apple.com> wrote: >>> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com> wrote: >>> On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at apple.com> wrote: >>>>> It also doesn't solve the problem of Functions themselves -- those are >>>>> also GlobalValues… >>>> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something. >>> Function passes can remove, duplicate, or just plain introduce call >>> sites (e.g. recognize a memset pattern), which means the use lists of >>> Functions can be changed during a function pass… >> >> Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think. > Oh, I see what you're saying now. Yes, agreed that that should work. > > Though by the time you're done implementing the conversion from and to > that representation and made sure you've covered all the corner cases, > I'm not so sure you've really saved a lot of effort relative to just > doing it right in the first place :) > > >> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently. > Agreed. > > Cheers, > NicolaiCCing myself back to the discussion. As to Johannes comments I've not problem getting data but the discussion seems to around implementation and what data we actually care about. That should be discussed first IMO. Nick>