thr3ads.net - llvm dev - [llvm-dev] LLD: time to enable --threads by default [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Davide Italiano via llvm-dev

2016-Nov-18 04:04 UTC

[llvm-dev] LLD: time to enable --threads by default

On Thu, Nov 17, 2016 at 7:34 PM, Rui Ueyama <ruiu at google.com>
wrote:> On Thu, Nov 17, 2016 at 6:30 PM, Davide Italiano <davide at
freebsd.org> wrote:
>>
>> On Thu, Nov 17, 2016 at 1:20 PM, Rafael Espíndola via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Thank you for the explanation! That makes sense.
>> >>
>> >> Unlike ThinLTO, each thread in LLD consumes very small amount
of memory
>> >> (probably just a few megabytes), so that's not a problem
for me. At the
>> >> final stage of linking, we spawn threads to copy section
contents and
>> >> apply
>> >> relocations, and I guess that causes a lot of memory traffic
because
>> >> that's
>> >> basically memcpy'ing input files to an output file, so the
memory
>> >> bandwidth
>> >> could be a limiting factor there. But I do not see a reason to
limit
>> >> the
>> >> number of threads to the number of physical core. For LLD, it
seems
>> >> like we
>> >> can just spawn as many threads as HT provides.
>> >
>> >
>> > It is quite common for SMT to *not* be profitable. I did notice
some
>> > small wins by not using it. On an intel machine you can do a quick
>> > check by running with half the threads since they always have 2x
SMT.
>> >
>>
>> I had the same experience. Ideally I would like to have a way to
>> override the number of threads used by the linker.
>> gold has a plethora of options for doing that, i.e.
>>
>>   --thread-count COUNT        Number of threads to use
>>   --thread-count-initial COUNT
>>                               Number of threads to use in initial pass
>>   --thread-count-middle COUNT Number of threads to use in middle pass
>>   --thread-count-final COUNT  Number of threads to use in final pass
>>
>> I don't think we need the full generality/flexibility of
>> initial/middle/final, but --thread-count could be useful (at least for
>> experimenting). The current interface of `parallel_for_each`
doesn't
>> allow to specify the number of threads to be run, so, assuming lld
>> goes that route (it may not), that should be extended accordingly.
>
>
> I agree that these options would be useful for testing, but I'm
reluctant to
> expose them as user options because I wish LLD would just work out of the
> box without turning lots of knobs.
>
I share your view that lld should work fine out-the-box. I think an alternative
is having the option as hidden, maybe. I consider the set of users
tinkering with linker options not large, although there are some
people who like to override/"tune" the linker anyway, so IMHO we
should expose a sane default and let users decide if they care or not
(a similar example is what we do for --thinlto-threads or
--lto-partitions, even if in the last case we still have that set to 1
because it's not entirely clear what's a reasonable number).

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Davide Italiano via llvm-dev

2016-Nov-18 04:09 UTC

head link

[llvm-dev] LLD: time to enable --threads by default

On Thu, Nov 17, 2016 at 8:04 PM, Davide Italiano <davide at freebsd.org>
wrote:> On Thu, Nov 17, 2016 at 7:34 PM, Rui Ueyama <ruiu at google.com>
wrote:
>> On Thu, Nov 17, 2016 at 6:30 PM, Davide Italiano <davide at
freebsd.org> wrote:
>>>
>>> On Thu, Nov 17, 2016 at 1:20 PM, Rafael Espíndola via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> Thank you for the explanation! That makes sense.
>>> >>
>>> >> Unlike ThinLTO, each thread in LLD consumes very small
amount of memory
>>> >> (probably just a few megabytes), so that's not a
problem for me. At the
>>> >> final stage of linking, we spawn threads to copy section
contents and
>>> >> apply
>>> >> relocations, and I guess that causes a lot of memory
traffic because
>>> >> that's
>>> >> basically memcpy'ing input files to an output file, so
the memory
>>> >> bandwidth
>>> >> could be a limiting factor there. But I do not see a
reason to limit
>>> >> the
>>> >> number of threads to the number of physical core. For LLD,
it seems
>>> >> like we
>>> >> can just spawn as many threads as HT provides.
>>> >
>>> >
>>> > It is quite common for SMT to *not* be profitable. I did
notice some
>>> > small wins by not using it. On an intel machine you can do a
quick
>>> > check by running with half the threads since they always have
2x SMT.
>>> >
>>>
>>> I had the same experience. Ideally I would like to have a way to
>>> override the number of threads used by the linker.
>>> gold has a plethora of options for doing that, i.e.
>>>
>>>   --thread-count COUNT        Number of threads to use
>>>   --thread-count-initial COUNT
>>>                               Number of threads to use in initial
pass
>>>   --thread-count-middle COUNT Number of threads to use in middle
pass
>>>   --thread-count-final COUNT  Number of threads to use in final
pass
>>>
>>> I don't think we need the full generality/flexibility of
>>> initial/middle/final, but --thread-count could be useful (at least
for
>>> experimenting). The current interface of `parallel_for_each`
doesn't
>>> allow to specify the number of threads to be run, so, assuming lld
>>> goes that route (it may not), that should be extended accordingly.
>>
>>
>> I agree that these options would be useful for testing, but I'm
reluctant to
>> expose them as user options because I wish LLD would just work out of
the
>> box without turning lots of knobs.
>>
>
> I share your view that lld should work fine out-the-box. I think an
alternative
> is having the option as hidden, maybe. I consider the set of users
> tinkering with linker options not large, although there are some
> people who like to override/"tune" the linker anyway, so IMHO we
> should expose a sane default and let users decide if they care or not
> (a similar example is what we do for --thinlto-threads or
> --lto-partitions, even if in the last case we still have that set to 1
> because it's not entirely clear what's a reasonable number).
>
I've seen a case where the linker was pinned to a specific subset of the
CPUs
and many linker invocations were launched in parallel.
(actually, this is the only time when I've seen --threads for gold used).
I personally don't expect this to be the common use-case, but it's not
hard
to imagine complex build systems adopting a similar strategy.

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Rui Ueyama via llvm-dev

2016-Nov-18 17:25 UTC

head link

[llvm-dev] LLD: time to enable --threads by default

Sure. If you want to add --thread-count (but not other options, such as
--thread-count-initial), that's fine with me.

On Thu, Nov 17, 2016 at 8:09 PM, Davide Italiano <davide at freebsd.org>
wrote:
> On Thu, Nov 17, 2016 at 8:04 PM, Davide Italiano <davide at
freebsd.org>
> wrote:
> > On Thu, Nov 17, 2016 at 7:34 PM, Rui Ueyama <ruiu at google.com>
wrote:
> >> On Thu, Nov 17, 2016 at 6:30 PM, Davide Italiano <davide at
freebsd.org>
> wrote:
> >>>
> >>> On Thu, Nov 17, 2016 at 1:20 PM, Rafael Espíndola via llvm-dev
> >>> <llvm-dev at lists.llvm.org> wrote:
> >>> >>
> >>> >> Thank you for the explanation! That makes sense.
> >>> >>
> >>> >> Unlike ThinLTO, each thread in LLD consumes very
small amount of
> memory
> >>> >> (probably just a few megabytes), so that's not a
problem for me. At
> the
> >>> >> final stage of linking, we spawn threads to copy
section contents
> and
> >>> >> apply
> >>> >> relocations, and I guess that causes a lot of memory
traffic because
> >>> >> that's
> >>> >> basically memcpy'ing input files to an output
file, so the memory
> >>> >> bandwidth
> >>> >> could be a limiting factor there. But I do not see a
reason to limit
> >>> >> the
> >>> >> number of threads to the number of physical core. For
LLD, it seems
> >>> >> like we
> >>> >> can just spawn as many threads as HT provides.
> >>> >
> >>> >
> >>> > It is quite common for SMT to *not* be profitable. I did
notice some
> >>> > small wins by not using it. On an intel machine you can
do a quick
> >>> > check by running with half the threads since they always
have 2x SMT.
> >>> >
> >>>
> >>> I had the same experience. Ideally I would like to have a way
to
> >>> override the number of threads used by the linker.
> >>> gold has a plethora of options for doing that, i.e.
> >>>
> >>>   --thread-count COUNT        Number of threads to use
> >>>   --thread-count-initial COUNT
> >>>                               Number of threads to use in
initial pass
> >>>   --thread-count-middle COUNT Number of threads to use in
middle pass
> >>>   --thread-count-final COUNT  Number of threads to use in
final pass
> >>>
> >>> I don't think we need the full generality/flexibility of
> >>> initial/middle/final, but --thread-count could be useful (at
least for
> >>> experimenting). The current interface of `parallel_for_each`
doesn't
> >>> allow to specify the number of threads to be run, so, assuming
lld
> >>> goes that route (it may not), that should be extended
accordingly.
> >>
> >>
> >> I agree that these options would be useful for testing, but
I'm
> reluctant to
> >> expose them as user options because I wish LLD would just work out
of
> the
> >> box without turning lots of knobs.
> >>
> >
> > I share your view that lld should work fine out-the-box. I think an
> alternative
> > is having the option as hidden, maybe. I consider the set of users
> > tinkering with linker options not large, although there are some
> > people who like to override/"tune" the linker anyway, so
IMHO we
> > should expose a sane default and let users decide if they care or not
> > (a similar example is what we do for --thinlto-threads or
> > --lto-partitions, even if in the last case we still have that set to 1
> > because it's not entirely clear what's a reasonable number).
> >
>
> I've seen a case where the linker was pinned to a specific subset of
the
> CPUs
> and many linker invocations were launched in parallel.
> (actually, this is the only time when I've seen --threads for gold
used).
> I personally don't expect this to be the common use-case, but it's
not hard
> to imagine complex build systems adopting a similar strategy.
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/7b85add3/attachment.html>

llvm dev - Nov 2016 - LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default

[llvm-dev] LLD: time to enable --threads by default