thr3ads.net - llvm dev - [llvm-dev] Multi-Threading Compilers [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Florian Hahn via llvm-dev

2020-Mar-26 11:06 UTC

[llvm-dev] Multi-Threading Compilers

> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at gmail.com>
wrote:
> 
> On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at
apple.com> wrote:
>>> It also doesn't solve the problem of Functions themselves --
those are
>>> also GlobalValues…
>> 
>> I am not sure why not. Function passes should only rely on the
information at the callsite & from the declaration IIRC. For functions, we
coulkd add extra declarations and update the call sites. But I might be missing
something.
> 
> Function passes can remove, duplicate, or just plain introduce call
> sites (e.g. recognize a memset pattern), which means the use lists of
> Functions can be changed during a function pass…

Sure, but a single function won’t be processed in parallel by a function pass
and would just work on the 'local version' of the globals it uses,
including called functions. So a function pass adding/removing calls to existing
‘local versions’ of functions should not be a problem I think.

One problem is that function passes can add new global, e.g. new declarations if
they introduce new calls. I guess that would require some locking, but should
happen quite infrequently.

Cheers,
Florian

David Chisnall via llvm-dev

2020-Mar-26 11:09 UTC

head link

[llvm-dev] Multi-Threading Compilers

On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote:> One problem is that function passes can add new global, e.g. new
declarations if they introduce new calls. I guess that would require some
locking, but should happen quite infrequently.
Can they?  I have had to make a pass a ModulePass in the past, because 
it added a global to hold a cache.  The global was used in only the 
function being modified, but the fact that it was a global prevented the 
pass from being a FunctionPass.

David

Florian Hahn via llvm-dev

2020-Mar-26 11:25 UTC

head link

[llvm-dev] Multi-Threading Compilers

> On Mar 26, 2020, at 11:09, David Chisnall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 26/03/2020 11:06, Florian Hahn via llvm-dev wrote:
>> One problem is that function passes can add new global, e.g. new
declarations if they introduce new calls. I guess that would require some
locking, but should happen quite infrequently.
> 
> Can they?  I have had to make a pass a ModulePass in the past, because it
added a global to hold a cache.  The global was used in only the function being
modified, but the fact that it was a global prevented the pass from being a
FunctionPass.
Maybe they should not, but I think in practice plenty of function passes insert
new declarations.

For example, I think Intrinsic::getDeclaration inserts a new declaration if the
requested intrinsic is not yet declared
(https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Function.cpp#L1117)
and a lot of function passes use it.

Cheers,
Florian

Nicolai Hähnle via llvm-dev

2020-Mar-26 15:56 UTC

head link

[llvm-dev] Multi-Threading Compilers

On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <florian_hahn at apple.com>
wrote:> > On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at
gmail.com> wrote:
> > On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at
apple.com> wrote:
> >>> It also doesn't solve the problem of Functions themselves
-- those are
> >>> also GlobalValues…
> >>
> >> I am not sure why not. Function passes should only rely on the
information at the callsite & from the declaration IIRC. For functions, we
coulkd add extra declarations and update the call sites. But I might be missing
something.
> >
> > Function passes can remove, duplicate, or just plain introduce call
> > sites (e.g. recognize a memset pattern), which means the use lists of
> > Functions can be changed during a function pass…
>
>
> Sure, but a single function won’t be processed in parallel by a function
pass and would just work on the 'local version' of the globals it uses,
including called functions. So a function pass adding/removing calls to existing
‘local versions’ of functions should not be a problem I think.
Oh, I see what you're saying now. Yes, agreed that that should work.

Though by the time you're done implementing the conversion from and to
that representation and made sure you've covered all the corner cases,
I'm not so sure you've really saved a lot of effort relative to just
doing it right in the first place :)

> One problem is that function passes can add new global, e.g. new
declarations if they introduce new calls. I guess that would require some
locking, but should happen quite infrequently.
Agreed.

Cheers,
Nicolai

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Alexandre Ganea via llvm-dev

2020-Mar-26 18:55 UTC

head link

[llvm-dev] Multi-Threading Compilers

Hello everyone,



Just to add a bit of spice to the discussion about “Multi-Threading Compilers”:
(sorry for just bringing high-level ideas)



We are heavy users of unity files (aka blobs or jumbos).



Unity files are a big pain, they add extra complexity, but at the same time they
provide tremendous build-time reductions, 10x or more. Our game projects
typically read >50,000 files during the full build of a single target, out of
which 20,000 .CPPs. The same unity target compiles only 600 unity .CPPs, which
themselves aggregate all of the 20,000 initial .CPPs. Building locally the
20,000 TUs on a modern 3.7 GHz 6-core PC takes more than 2 hours 30 min. With
unity files, it takes 20 minutes. Distributing it remotely on pool of machines
takes 5 min. Caching everything and rebuilding takes 45 sec.



However we’re now tributary of the order of files in the unities. If files or
folders are added or removed in the codebase, the contents of the unity can
change, thus the cache is invalidated for that unity CPP. And that happens quite
often in production.

Unities also induce higher build times in some cases, spikes, like I was showing
in a previous post of this thread. Without inspecting the AST, it is hard to
determine an optimal “cutting” point when building the unity .CPPs. We can end
up with unities including template-heavy .CPPs which will take a lot longer than
other Unity files.



If we are to discuss multi-threading, this means we are discussing compile-time
performance and how compilation would scale in the future. I think we should
consider the functionality of unity files in the compiler (maybe behind a flag
if it’s non-conformant).



While I don't know exactly how that fits in this (multi-treading)
discussion, efficiently coalescing compilation of several TUs should be the
compiler's responsibility, and likely will be more efficient than doing it
by a pre-build tool, like we do today.



In essence, if we were to provide a large number of files to Clang, let's
say with the same options: (the /MP flag is still WIP, I'll get back to that
soon, [1])



                clang-cl /c a.cpp b.cpp c.cpp d.cpp ... /MP



And then expect the compiler to (somehow) share
tokenization-lexing-filecaching-preprocessing-compilation-optims-computations-etc
across TUs, in a lock-free manner preferably. Overlapped/duplicated computations
across threads, in the manner of transactions, would be probably fine, if
computations are small and if we want to avoid locks (but this needs to be
profiled). Also the recent trend of NUMA processor “tiles” as well as HBM2
memory on-chip per “tile”, could change the way multi-threaded code is written.
Perhaps states would need to be duplicated in the local NUMA memory for maximum
performance. Additionally, I’m not sure (how/if) lock-based programming will
scale past a few hundreds, or thousands of cores in a single image without major
contention. Maybe, as long as locks don’t cross NUMA boundaries. This needs to
be considered in the design.



So while the discussion seems to around multi-threading single TUs, it’d be nice
to also consider the possibility of sharing state between TUs. Which maybe means
retaining runtime state in global hash table(s). And possibly persisting that
state on disk, or in a DB, after the compilation -- we could maybe draw a
parallel with work done by SN Systems (Program Repo, see [2]), or zapcc [3].



Thanks!

Alex.



[1] https://reviews.llvm.org/D52193

[2] https://github.com/SNSystems/llvm-project-prepo

[3] https://github.com/yrnkrn/zapcc


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200326/11917793/attachment.html>

Nicholas Krause via llvm-dev

2020-Mar-26 21:29 UTC

head link

[llvm-dev] Multi-Threading Compilers

On 3/26/20 11:56 AM, Nicolai Hähnle via llvm-dev wrote:> On Thu, Mar 26, 2020 at 12:06 PM Florian Hahn <florian_hahn at
apple.com> wrote:
>>> On Mar 26, 2020, at 10:55, Nicolai Hähnle <nhaehnle at
gmail.com> wrote:
>>> On Thu, Mar 26, 2020 at 11:53 AM Florian Hahn <florian_hahn at
apple.com> wrote:
>>>>> It also doesn't solve the problem of Functions
themselves -- those are
>>>>> also GlobalValues…
>>>> I am not sure why not. Function passes should only rely on the
information at the callsite & from the declaration IIRC. For functions, we
coulkd add extra declarations and update the call sites. But I might be missing
something.
>>> Function passes can remove, duplicate, or just plain introduce call
>>> sites (e.g. recognize a memset pattern), which means the use lists
of
>>> Functions can be changed during a function pass…
>>
>> Sure, but a single function won’t be processed in parallel by a
function pass and would just work on the 'local version' of the globals
it uses, including called functions. So a function pass adding/removing calls to
existing ‘local versions’ of functions should not be a problem I think.
> Oh, I see what you're saying now. Yes, agreed that that should work.
>
> Though by the time you're done implementing the conversion from and to
> that representation and made sure you've covered all the corner cases,
> I'm not so sure you've really saved a lot of effort relative to
just
> doing it right in the first place :)
>
>
>> One problem is that function passes can add new global, e.g. new
declarations if they introduce new calls. I guess that would require some
locking, but should happen quite infrequently.
> Agreed.
>
> Cheers,
> NicolaiCCing myself back to the discussion. As to Johannes comments I've not 
problem getting data
but the discussion seems to around implementation and what data we 
actually care about.
That should be discussed first IMO.

Nick>

llvm dev - Mar 2020 - Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers