thr3ads.net - llvm dev - [llvm-dev] Multi-Threading Compilers [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2020-Mar-01 02:38 UTC

[llvm-dev] Multi-Threading Compilers

On Sat, Feb 29, 2020 at 5:14 PM Nicholas Krause via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On 2/29/20 7:23 PM, River Riddle wrote:
>
>
>
> On Sat, Feb 29, 2020 at 4:00 PM Nicholas Krause <xerofoify at
gmail.com>
> wrote:
>
>>
>>
>> On 2/29/20 6:17 PM, River Riddle via llvm-dev wrote:
>>
>>
>>
>> On Sat, Feb 29, 2020 at 2:25 PM David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Sat, Feb 29, 2020 at 2:19 PM Chris Lattner <clattner at
nondot.org>
>>> wrote:
>>>
>>>> On Feb 29, 2020, at 2:08 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>>
>>>> I've
>>>>> curious as
>>>>> to how MLIR deals with IPO as that's the problem I was
running into.
>>>>>
>>>>
>>>> FWIW I believe LLVM's new pass manager (NPM) was designed
with
>>>> parallelism and the ability to support this situation (that
MLIR doesn't?
>>>> Or doesn't to the degree/way in which the NPM does).
I'll leave it to folks
>>>> (Chandler probably has the most context here) to provide some
more detail
>>>> there if they can/have time.
>>>>
>>>>
>>>> Historically speaking, all of the LLVM pass managers have been
designed
>>>> to support multithreaded compilation (check out the ancient
history of the
>>>> WritingAnLLVMPass
<http://llvm.org/docs/WritingAnLLVMPass.html> doc if
>>>> curious).
>>>>
>>>
>>> I think the specific thing that might'v been a bit different in
the NPM
>>> was to do with analysis invalidation in a way that's more
parallelism
>>> friendly than the previous one - but I may be
>>> misrepresenting/misundrstanding some of it.
>>>
>>>
>>>> The problem is that LLVM has global use-def chains on
constants,
>>>> functions and globals, etc, so it is impractical to do this. 
Every
>>>> “inst->setOperand” would have to be able to take locks or
use something
>>>> like software transactional memory techniques in their
implementation.
>>>> This would be very complicated and very slow.
>>>>
>>>
>>> Oh, yeah - I recall that particular limitation being discussed/not
>>> addressed as yet.
>>>
>>>
>>>> MLIR defines this away from the beginning.  This is a result of
the
>>>> core IR design, not the pass manager design itself.
>>>>
>>>
>>> What does MLIR do differently here/how does it define that issue
away?
>>> (doesn't have use-lists built-in?)
>>>
>>
>> The major thing is that constants and global-like objects don't
produce
>> SSA values and thus don't have use-lists.
>> https://mlir.llvm.org/docs/Rationale/#multithreading-the-compiler
discusses
>> this a bit.
>>
>> For constants, the data is stored as an Attribute(context uniqued
>> metadata, have no use-list, not SSA). This attribute can either placed
in
>> the attribute list(if the operand is always constant, like for the
value of
>> a switch case), otherwise it must be explicitly materialized via some
>> operation. For example, the `std.constant
>>
<https://mlir.llvm.org/docs/Dialects/Standard/#constant-operation>`
>> operation will materialize an SSA value from some attribute data.
>>
>> For references to functions and other global-like objects, we have a
>> non-SSA mechanism built around `symbols`. This is essentially using a
>> special attribute to reference the function by-name, instead of by ssa
>> value. You can find more information on MLIR symbols here
>> <https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>.
>>
>> Along with the above, there is a trait that can be attached to
operations
>> called `IsolatedFromAbove
>> <https://mlir.llvm.org/docs/Traits/#isolatedfromabove>`. This
>> essentially means that no SSA values defined above a region can be
>> referenced from within that region. The pass manager only allows
schedule
>> passes on operations that have this property, meaning that all
pipelines
>> are implicitly multi-threaded.
>>
>> The pass manager in MLIR was heavily inspired by the work on the new
pass
>> manager in LLVM, but with specific constraints/requirements that are
unique
>> to the design of MLIR. That being said, there are some usability
features
>> added that would also make great additions to LLVM: instance specific
pass
>> options and statistics, pipeline crash reproducer generation, etc.
>>
>> Not sure if any of the above helps clarify, but happy to chat more if
you
>> are interested.
>>
>> -- River
>>
>>
>>> - Dave
>>>
>> River,
>> The big thing from my reading of the Pass Manager in MLIR is that it
>> allows us to iterate through
>> a pass per function or module as a group allowing it to run in async.
>> I've proposed this
>> on the GCC side:
>> https://gcc.gnu.org/ml/gcc/2020-02/msg00247.html
>>
>> Its to walk through the IPA passes which are similar to analyze passes
on
>> the LLVM side.
>>
>
> Hi Nicholas,
>
> I can't say anything about the GCC side, but this isn't a
particularly
> novel aspect of the MLIR pass manager. In many ways, the pass manager is
> the easiest/simplest part of the multi-threading problem. The bigger
> problem is making sure that the rest of the compiler infrastructure is
> structured in a way that is thread-safe, or can be made thread-safe. This
> is why most of the discussion is based around how to model things like
> constants, global values, etc. When I made MLIR multi-threaded a year ago,
> a large majority of my time was spent outside of the pass manager. For a
> real example, I spent much more time just on multi-threaded pass timing
> <https://mlir.llvm.org/docs/WritingAPass/#multi-threaded-pass-timing>
> than making the pass manager itself multi-threaded.
>
> -- River
>
> Actually in my experience, the biggest problem is if we can detect IPO and
> run async guarantees on that. MLIR runs operations but only for a module or
> set of functions
> without this. One of my dreams would be to run passes in parallel
> including IPO detection and stop if it cannot continue pass a IPO pass or
> set of passes due to changes.
>
> Maybe MLIR does do that but its the one bottleneck that is really hard to
> fix,
>
What MLIR does (that would require quite some work in LLVM) is making sure
that you can process and transform functions in isolation, allowing to run
*local* optimizations in parallel. This does not solve the IPO problem
you're after. As I understand it, this is a difficult thing to design, and
it requires consideration about how you think the passes and the
pass-pipeline entirely.

Running function-passes and "local" optimizations in parallel in LLVM
isn't
possible because the structures in the LLVMContext aren't thread-safe, and
because the IR itself isn't thread-safe. Something like just DCE or CSE a
function call requires to modify the callee (through its use-list).

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200229/a7633d2b/attachment.html>

Nicholas Krause via llvm-dev

2020-Mar-01 06:36 UTC

head link

[llvm-dev] Multi-Threading Compilers

On 2/29/20 9:38 PM, Mehdi AMINI wrote:>
>
> On Sat, Feb 29, 2020 at 5:14 PM Nicholas Krause via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>
>
>     On 2/29/20 7:23 PM, River Riddle wrote:
>>
>>
>>     On Sat, Feb 29, 2020 at 4:00 PM Nicholas Krause
>>     <xerofoify at gmail.com <mailto:xerofoify at
gmail.com>> wrote:
>>
>>
>>
>>         On 2/29/20 6:17 PM, River Riddle via llvm-dev wrote:
>>>
>>>
>>>         On Sat, Feb 29, 2020 at 2:25 PM David Blaikie via llvm-dev
>>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
>>>         wrote:
>>>
>>>
>>>
>>>             On Sat, Feb 29, 2020 at 2:19 PM Chris Lattner
>>>             <clattner at nondot.org <mailto:clattner at
nondot.org>> wrote:
>>>
>>>                 On Feb 29, 2020, at 2:08 PM, David Blaikie
>>>                 <dblaikie at gmail.com <mailto:dblaikie at
gmail.com>> wrote:
>>>>
>>>>                     I've
>>>>                     curious as
>>>>                     to how MLIR deals with IPO as that's
the
>>>>                     problem I was running into.
>>>>
>>>>
>>>>                 FWIW I believe LLVM's new pass manager
(NPM) was
>>>>                 designed with parallelism and the ability to
>>>>                 support this situation (that MLIR doesn't?
Or
>>>>                 doesn't to the degree/way in which the NPM
does).
>>>>                 I'll leave it to folks (Chandler probably
has the
>>>>                 most context here) to provide some more detail
>>>>                 there if they can/have time.
>>>
>>>                 Historically speaking, all of the LLVM pass
managers
>>>                 have been designed to support multithreaded
>>>                 compilation (check out the ancient history of the
>>>                 WritingAnLLVMPass
>>>                
<http://llvm.org/docs/WritingAnLLVMPass.html> doc if
>>>                 curious).
>>>
>>>
>>>             I think the specific thing that might'v been a bit
>>>             different in the NPM was to do with analysis
>>>             invalidation in a way that's more parallelism
friendly
>>>             than the previous one - but I may be
>>>             misrepresenting/misundrstanding some of it.
>>>
>>>                 The problem is that LLVM has global use-def chains
>>>                 on constants, functions and globals, etc, so it is
>>>                 impractical to do this. Every “inst->setOperand”
>>>                 would have to be able to take locks or use
something
>>>                 like software transactional memory techniques in
>>>                 their implementation.  This would be very
>>>                 complicated and very slow.
>>>
>>>
>>>             Oh, yeah - I recall that particular limitation being
>>>             discussed/not addressed as yet.
>>>
>>>                 MLIR defines this away from the beginning.  This is
>>>                 a result of the core IR design, not the pass
manager
>>>                 design itself.
>>>
>>>
>>>             What does MLIR do differently here/how does it define
>>>             that issue away? (doesn't have use-lists built-in?)
>>>
>>>
>>>         The major thing is that constants and global-like objects
>>>         don't produce SSA values and thus don't have
use-lists.
>>>        
https://mlir.llvm.org/docs/Rationale/#multithreading-the-compiler discusses
>>>         this a bit.
>>>
>>>         For constants, the data is stored as an Attribute(context
>>>         uniqued metadata, have no use-list, not SSA). This
attribute
>>>         can either placed in the attribute list(if the operand is
>>>         always constant, like for the value of a switch case),
>>>         otherwise it must be explicitly materialized via some
>>>         operation. For example, the `std.constant
>>>        
<https://mlir.llvm.org/docs/Dialects/Standard/#constant-operation>`
>>>         operation will materialize an SSA value from some attribute
>>>         data.
>>>
>>>         For references to functions and other global-like objects,
>>>         we have a non-SSA mechanism built around `symbols`. This is
>>>         essentially using a special attribute to reference the
>>>         function by-name, instead of by ssa value. You can find
more
>>>         information on MLIR symbols here
>>>         <https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>.
>>>
>>>         Along with the above, there is a trait that can be attached
>>>         to operations called `IsolatedFromAbove
>>>        
<https://mlir.llvm.org/docs/Traits/#isolatedfromabove>`.
>>>         This essentially means that no SSA values defined above a
>>>         region can be referenced from within that region. The pass
>>>         manager only allows schedule passes on operations that have
>>>         this property, meaning that all pipelines are implicitly
>>>         multi-threaded.
>>>
>>>         The pass manager in MLIR was heavily inspired by the work
on
>>>         the new pass manager in LLVM, but with specific
>>>         constraints/requirements that are unique to the design of
>>>         MLIR. That being said, there are some usability features
>>>         added that would also make great additions to LLVM:
instance
>>>         specific pass options and statistics, pipeline crash
>>>         reproducer generation, etc.
>>>
>>>         Not sure if any of the above helps clarify, but happy to
>>>         chat more if you are interested.
>>>
>>>         -- River
>>>
>>>             - Dave
>>>
>>         River,
>>         The big thing from my reading of the Pass Manager in MLIR is
>>         that it allows us to iterate through
>>         a pass per function or module as a group allowing it to run
>>         in async. I've proposed this
>>         on the GCC side:
>>         https://gcc.gnu.org/ml/gcc/2020-02/msg00247.html
>>
>>         Its to walk through the IPA passes which are similar to
>>         analyze passes on the LLVM side.
>>
>>
>>     Hi Nicholas,
>>
>>     I can't say anything about the GCC side, but this isn't a
>>     particularly novel aspect of the MLIR pass manager. In many ways,
>>     the pass manager is the easiest/simplest part of the
>>     multi-threading problem. The bigger problem is making sure that
>>     the rest of the compiler infrastructure is structured in a way
>>     that is thread-safe, or can be made thread-safe. This is why most
>>     of the discussion is based around how to model things like
>>     constants, global values, etc. When I made MLIR multi-threaded a
>>     year ago, a large majority of my time was spent outside of the
>>     pass manager. For a real example, I spent much more time just on
>>     multi-threaded pass timing
>>    
<https://mlir.llvm.org/docs/WritingAPass/#multi-threaded-pass-timing>
>>     than making the pass manager itself multi-threaded.
>>
>>     -- River
>     Actually in my experience, the biggest problem is if we can detect
>     IPO and run async guarantees on that. MLIR runs operations but
>     only for a module or set of functions
>     without this. One of my dreams would be to run passes in parallel
>     including IPO detection and stop if it cannot continue pass a IPO
>     pass or set of passes due to changes.
>
>     Maybe MLIR does do that but its the one bottleneck that is really
>     hard to fix,
>
>
> What MLIR does (that would require quite some work in LLVM) is making 
> sure that you can process and transform functions in isolation, 
> allowing to run *local* optimizations in parallel. This does not solve 
> the IPO problem you're after. As I understand it, this is a difficult 
> thing to design, and it requires consideration about how you think the 
> passes and the pass-pipeline entirely.I've been thinking about it on the GCC side for the last few months. 
It's non trivial through something similar to this may be possible but I
will need to research both pass managers better:
https://elixir.bootlin.com/linux/latest/source/include/linux/workqueue.h#L102

I'm looking on some RFCS for the pass manager and SSA iterators on the 
GCC side. Through one easy thing to do is a lot of core
classes in both GCC/LLVM there are loops that run a lot. LRA on the GCC 
for register allocation does. We may want to figure
out how and if its a good idea to run these loops in small worker 
functions that are only used by the one function that  requires
it. The only real question is some loops are only very performance 
intensive on corner cases which we may require heuristics
in order to decide when to launch i.e. number of elements iterating 
through e.t.c. Maybe that's a bad idea but after looking
through the GCC and LLVM side a little this seems to be a later tedious 
but trivial fix mostly due to figuring out what loops are
expensive enough to do this.

I believe Johannes was going to reach out next week about the Module 
Classes and we will go for there,

Nick>
> Running function-passes and "local" optimizations in parallel in
LLVM
> isn't possible because the structures in the LLVMContext aren't 
> thread-safe, and because the IR itself isn't thread-safe. Something 
> like just DCE or CSE a function call requires to modify the callee 
> (through its use-list).
>
> -- 
> Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200301/d4999bcf/attachment.html>

Neil Nelson via llvm-dev

2020-Mar-01 09:29 UTC

head link

[llvm-dev] Multi-Threading Compilers

This is a recent desktop.
Xubuntu 19.10
Compiling for 10.0.0 clang and llvm. See below.

For this test, running 14 processors in a gui VM. The cores are 
hyperthreaded, processors are twice the cores, but all the cores before 
the run are showing negligible activity.

compile_commands.json has 3022 entries.

The ninja compile run lasted 7 minutes and 43 seconds with 99% all 
processor usage throughout.

We then have 7*60+43 = 463 seconds.

Compile seconds per compile line in compile_commands.json 463/3022 = 
0.1532 seconds. Average compile time per processor would be about 
14*0.1532 seconds.

cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;llvm"
-DLLVM_USE_LINKER=lld
-DCMAKE_BUILD_TYPE="Release" -DLLVM_TARGETS_TO_BUILD=X86 
-DLLVM_ENABLE_LIBPFM=OFF -DRUN_HAVE_GNU_POSIX_REGEX=0 
-DRUN_HAVE_THREAD_SAFETY_ATTRIBUTES=0 -Wno-dev ../llvm &> 
/home/nnelson/Documents/cmake.log

ninja &> /home/nnelson/Documents/ninja.log

Here are some useful pages from Threading Building Blocks.

Task-Based Programming
https://software.intel.com/en-us/node/506100

Appendix A Costs of Time Slicing
https://software.intel.com/en-us/node/506127

The point

When the number of compiles exceeds the number of cores such that all 
the cores are utilized, nothing is gained by trying to multi-thread the 
individual compiles. In fact, loading up the cores with more threads or 
tasks than there are cores will reduce compiling efficiency because of 
time slicing. And sequencing through more tasks than less when the cores 
are not overloaded will reduce compiling efficiency because more tasks 
have to be loaded and unloaded to the cores.

Neil Nelson
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200301/d8a69443/attachment.html>

llvm dev - Mar 2020 - Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers

[llvm-dev] Multi-Threading Compilers