> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com> wrote: > > On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at apple.com> wrote: >>> It also doesn't solve the problem of Functions themselves -- those are >>> also GlobalValues… >> >> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something. > > Function passes can remove, duplicate, or just plain introduce call > sites (e.g. recognize a memset pattern), which means the use lists of > Functions can be changed during a function pass…Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think. One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently. Cheers, Florian
On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote:> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.Can they? I have had to make a pass a ModulePass in the past, because it added a global to hold a cache. The global was used in only the function being modified, but the fact that it was a global prevented the pass from being a FunctionPass. David
> On Mar 26, 2020, at 11:09, David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote: >> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently. > > Can they? I have had to make a pass a ModulePass in the past, because it added a global to hold a cache. The global was used in only the function being modified, but the fact that it was a global prevented the pass from being a FunctionPass.Maybe they should not, but I think in practice plenty of function passes insert new declarations. For example, I think Intrinsic::getDeclaration inserts a new declaration if the requested intrinsic is not yet declared (https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Function.cpp#L1117) and a lot of function passes use it. Cheers, Florian
On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <florian_hahn at apple.com> wrote:> > On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com> wrote: > > On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at apple.com> wrote: > >>> It also doesn't solve the problem of Functions themselves -- those are > >>> also GlobalValues… > >> > >> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something. > > > > Function passes can remove, duplicate, or just plain introduce call > > sites (e.g. recognize a memset pattern), which means the use lists of > > Functions can be changed during a function pass… > > > Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think.Oh, I see what you're saying now. Yes, agreed that that should work. Though by the time you're done implementing the conversion from and to that representation and made sure you've covered all the corner cases, I'm not so sure you've really saved a lot of effort relative to just doing it right in the first place :)> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently.Agreed. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte.
Hello everyone, Just to add a bit of spice to the discussion about “Multi-Threading Compilers”: (sorry for just bringing high-level ideas) We are heavy users of unity files (aka blobs or jumbos). Unity files are a big pain, they add extra complexity, but at the same time they provide tremendous build-time reductions, 10x or more. Our game projects typically read >50,000 files during the full build of a single target, out of which 20,000 .CPPs. The same unity target compiles only 600 unity .CPPs, which themselves aggregate all of the 20,000 initial .CPPs. Building locally the 20,000 TUs on a modern 3.7 GHz 6-core PC takes more than 2 hours 30 min. With unity files, it takes 20 minutes. Distributing it remotely on pool of machines takes 5 min. Caching everything and rebuilding takes 45 sec. However we’re now tributary of the order of files in the unities. If files or folders are added or removed in the codebase, the contents of the unity can change, thus the cache is invalidated for that unity CPP. And that happens quite often in production. Unities also induce higher build times in some cases, spikes, like I was showing in a previous post of this thread. Without inspecting the AST, it is hard to determine an optimal “cutting” point when building the unity .CPPs. We can end up with unities including template-heavy .CPPs which will take a lot longer than other Unity files. If we are to discuss multi-threading, this means we are discussing compile-time performance and how compilation would scale in the future. I think we should consider the functionality of unity files in the compiler (maybe behind a flag if it’s non-conformant). While I don't know exactly how that fits in this (multi-treading) discussion, efficiently coalescing compilation of several TUs should be the compiler's responsibility, and likely will be more efficient than doing it by a pre-build tool, like we do today. In essence, if we were to provide a large number of files to Clang, let's say with the same options: (the /MP flag is still WIP, I'll get back to that soon, [1]) clang-cl /c a.cpp b.cpp c.cpp d.cpp ... /MP And then expect the compiler to (somehow) share tokenization-lexing-filecaching-preprocessing-compilation-optims-computations-etc across TUs, in a lock-free manner preferably. Overlapped/duplicated computations across threads, in the manner of transactions, would be probably fine, if computations are small and if we want to avoid locks (but this needs to be profiled). Also the recent trend of NUMA processor “tiles” as well as HBM2 memory on-chip per “tile”, could change the way multi-threaded code is written. Perhaps states would need to be duplicated in the local NUMA memory for maximum performance. Additionally, I’m not sure (how/if) lock-based programming will scale past a few hundreds, or thousands of cores in a single image without major contention. Maybe, as long as locks don’t cross NUMA boundaries. This needs to be considered in the design. So while the discussion seems to around multi-threading single TUs, it’d be nice to also consider the possibility of sharing state between TUs. Which maybe means retaining runtime state in global hash table(s). And possibly persisting that state on disk, or in a DB, after the compilation -- we could maybe draw a parallel with work done by SN Systems (Program Repo, see [2]), or zapcc [3]. Thanks! Alex. [1] https://reviews.llvm.org/D52193 [2] https://github.com/SNSystems/llvm-project-prepo [3] https://github.com/yrnkrn/zapcc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200326/11917793/attachment.html>
On 3/26/20 11:56 AM, Nicolai Hähnle via llvm-dev wrote:> On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <florian_hahn at apple.com> wrote: >>> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com> wrote: >>> On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at apple.com> wrote: >>>>> It also doesn't solve the problem of Functions themselves -- those are >>>>> also GlobalValues… >>>> I am not sure why not. Function passes should only rely on the information at the callsite & from the declaration IIRC. For functions, we coulkd add extra declarations and update the call sites. But I might be missing something. >>> Function passes can remove, duplicate, or just plain introduce call >>> sites (e.g. recognize a memset pattern), which means the use lists of >>> Functions can be changed during a function pass… >> >> Sure, but a single function won’t be processed in parallel by a function pass and would just work on the 'local version' of the globals it uses, including called functions. So a function pass adding/removing calls to existing ‘local versions’ of functions should not be a problem I think. > Oh, I see what you're saying now. Yes, agreed that that should work. > > Though by the time you're done implementing the conversion from and to > that representation and made sure you've covered all the corner cases, > I'm not so sure you've really saved a lot of effort relative to just > doing it right in the first place :) > > >> One problem is that function passes can add new global, e.g. new declarations if they introduce new calls. I guess that would require some locking, but should happen quite infrequently. > Agreed. > > Cheers, > NicolaiCCing myself back to the discussion. As to Johannes comments I've not problem getting data but the discussion seems to around implementation and what data we actually care about. That should be discussed first IMO. Nick>