thr3ads.net - llvm dev - [llvm-dev] LLD: time to enable --threads by default [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Rui Ueyama via llvm-dev

2016-Nov-17 03:20 UTC

[llvm-dev] LLD: time to enable --threads by default

Here is the result of running 20 threads on 20 physical cores (40 virtual
cores).

      19002.081139 task-clock (msec)         #    2.147 CPUs utilized
     ( +-  2.88% )
            23,006 context-switches          #    0.001 M/sec
     ( +-  2.24% )
             1,491 cpu-migrations            #    0.078 K/sec
     ( +- 22.50% )
         2,607,076 page-faults               #    0.137 M/sec
     ( +-  0.83% )
    56,818,049,785 cycles                    #    2.990 GHz
     ( +-  2.54% )
    41,072,435,357 stalled-cycles-frontend   #   72.29% frontend cycles
idle     ( +-  3.36% )
   <not supported> stalled-cycles-backend
    41,090,608,917 instructions              #    0.72  insns per cycle

                                             #    1.00  stalled cycles per
insn  ( +-  0.46% )
     7,621,825,115 branches                  #  401.105 M/sec
     ( +-  0.52% )
       139,383,452 branch-misses             #    1.83% of all branches
     ( +-  0.18% )

       8.848611242 seconds time elapsed
     ( +-  2.72% )

and this is the single-thread result.

      12738.416627 task-clock (msec)         #    1.000 CPUs utilized
     ( +-  5.04% )
             1,283 context-switches          #    0.101 K/sec
     ( +-  5.49% )
                 3 cpu-migrations            #    0.000 K/sec
     ( +- 55.20% )
         2,614,435 page-faults               #    0.205 M/sec
     ( +-  2.52% )
    41,732,843,312 cycles                    #    3.276 GHz
     ( +-  5.76% )
    26,816,171,736 stalled-cycles-frontend   #   64.26% frontend cycles
idle     ( +-  8.48% )
   <not supported> stalled-cycles-backend
    39,776,444,917 instructions              #    0.95  insns per cycle

                                             #    0.67  stalled cycles per
insn  ( +-  0.84% )
     7,288,624,141 branches                  #  572.177 M/sec
     ( +-  1.02% )
       135,684,171 branch-misses             #    1.86% of all branches
     ( +-  0.12% )

      12.734335840 seconds time elapsed
     ( +-  5.03% )


On Wed, Nov 16, 2016 at 6:13 PM, Joerg Sonnenberger via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Wed, Nov 16, 2016 at 05:26:23PM -0800, Rui Ueyama wrote:
> > Did you see this
> > http://llvm.org/viewvc/llvm-project?view=revision&revision=287140
?
> > Interpreting these numbers may be tricky because of hyper threading,
> though.
>
> Can you try that with a CPU set that explicitly doesn't include the HT
> cores? That's more likely to give a reasonable answer for "what is
the
> thread overhead".
>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/f0da2161/attachment.html>

Peter Smith via llvm-dev

2016-Nov-17 09:48 UTC

head link

[llvm-dev] LLD: time to enable --threads by default

I've no objections to changing the default as multi-threading can
always be turned off and it stands to benefit most.

One possible thing to consider is would multi-threading increase
memory usage? I'm most concerned about virtual address space as this
can get eaten up very quickly on a 32-bit machine, particularly when
debug is used. Given that the data set isn't increased when enabling
multiple threads I speculate that the biggest risk would be different
threads mmapping overlapping parts of the files in a non-shared way.

It will be worth keeping track of how much memory is being used as
people may need to alter their maximum number of parallel link jobs to
compensate. From prior experience building clang with debug on a 16-gb
machine using -j8 will bring it to a halt.

Peter



On 17 November 2016 at 03:20, Rui Ueyama via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Here is the result of running 20 threads on 20 physical cores (40 virtual
> cores).
>
>       19002.081139 task-clock (msec)         #    2.147 CPUs utilized
> ( +-  2.88% )
>             23,006 context-switches          #    0.001 M/sec
> ( +-  2.24% )
>              1,491 cpu-migrations            #    0.078 K/sec
> ( +- 22.50% )
>          2,607,076 page-faults               #    0.137 M/sec
> ( +-  0.83% )
>     56,818,049,785 cycles                    #    2.990 GHz
> ( +-  2.54% )
>     41,072,435,357 stalled-cycles-frontend   #   72.29% frontend cycles
idle
> ( +-  3.36% )
>    <not supported> stalled-cycles-backend
>     41,090,608,917 instructions              #    0.72  insns per cycle
>                                              #    1.00  stalled cycles per
> insn  ( +-  0.46% )
>      7,621,825,115 branches                  #  401.105 M/sec
> ( +-  0.52% )
>        139,383,452 branch-misses             #    1.83% of all branches
> ( +-  0.18% )
>
>        8.848611242 seconds time elapsed
> ( +-  2.72% )
>
> and this is the single-thread result.
>
>       12738.416627 task-clock (msec)         #    1.000 CPUs utilized
> ( +-  5.04% )
>              1,283 context-switches          #    0.101 K/sec
> ( +-  5.49% )
>                  3 cpu-migrations            #    0.000 K/sec
> ( +- 55.20% )
>          2,614,435 page-faults               #    0.205 M/sec
> ( +-  2.52% )
>     41,732,843,312 cycles                    #    3.276 GHz
> ( +-  5.76% )
>     26,816,171,736 stalled-cycles-frontend   #   64.26% frontend cycles
idle
> ( +-  8.48% )
>    <not supported> stalled-cycles-backend
>     39,776,444,917 instructions              #    0.95  insns per cycle
>                                              #    0.67  stalled cycles per
> insn  ( +-  0.84% )
>      7,288,624,141 branches                  #  572.177 M/sec
> ( +-  1.02% )
>        135,684,171 branch-misses             #    1.86% of all branches
> ( +-  0.12% )
>
>       12.734335840 seconds time elapsed
> ( +-  5.03% )
>
>
> On Wed, Nov 16, 2016 at 6:13 PM, Joerg Sonnenberger via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> On Wed, Nov 16, 2016 at 05:26:23PM -0800, Rui Ueyama wrote:
>> > Did you see this
>> >
http://llvm.org/viewvc/llvm-project?view=revision&revision=287140 ?
>> > Interpreting these numbers may be tricky because of hyper
threading,
>> > though.
>>
>> Can you try that with a CPU set that explicitly doesn't include the
HT
>> cores? That's more likely to give a reasonable answer for
"what is the
>> thread overhead".
>>
>> Joerg
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Renato Golin via llvm-dev

2016-Nov-17 09:49 UTC

head link

[llvm-dev] LLD: time to enable --threads by default

On 17 November 2016 at 03:20, Rui Ueyama via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Here is the result of running 20 threads on 20 physical cores (40 virtual
> cores).
>
>       19002.081139 task-clock (msec)         #    2.147 CPUs utilized
>       12738.416627 task-clock (msec)         #    1.000 CPUs utilized
Sounds like threading isn't beneficial much beyond the second CPU...
Maybe blindly creating one thread per core isn't the best plan...

--renato

Rafael Espíndola via llvm-dev

2016-Nov-17 12:11 UTC

head link

[llvm-dev] LLD: time to enable --threads by default

> Sounds like threading isn't beneficial much beyond the second CPU...
> Maybe blindly creating one thread per core isn't the best plan...
parallel.h is pretty simplistic at the moment. Currently it creates
one per SMT. One per core and being lazy about it would probably be a
good thing, but threading is already beneficial and improving
parallel.h an welcome improvement.

Cheers,
Rafael

Rui Ueyama via llvm-dev

2016-Nov-17 17:10 UTC

head link

[llvm-dev] LLD: time to enable --threads by default

On Thu, Nov 17, 2016 at 1:49 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 17 November 2016 at 03:20, Rui Ueyama via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Here is the result of running 20 threads on 20 physical cores (40
virtual
> > cores).
> >
> >       19002.081139 task-clock (msec)         #    2.147 CPUs utilized
> >       12738.416627 task-clock (msec)         #    1.000 CPUs utilized
>
> Sounds like threading isn't beneficial much beyond the second CPU...
> Maybe blindly creating one thread per core isn't the best plan...
>
That's an average. When it's at peak, it's using more than two
cores.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/93be1273/attachment.html>

llvm dev - Nov 2016 - LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default