Hal Finkel via llvm-dev
2017-Feb-08 06:24 UTC
[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:> Sorry if I missed it, but what machine/CPU are you using to collect > the perf numbers? > > I am concerned that what may be a win on a CPU that keeps a couple of > hundred instructions in-flight and has many MB of caches will not hold > for a small core.In my experience, unrolling tends to help weaker cores even more than stronger ones because it allows the instruction scheduler more opportunities to hide latency. Obviously, instruction-cache pressure is an important consideration, but the code size changes here seems small.> > Is the proposed change universal? Is there a way to undo it?All of the unrolling thresholds should be target-adjustable using the TTI::getUnrollingPreferences hook. -Hal> > On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Ping... with the updated code size impact data, any more comments? > Any more data that would be interesting to collect? > > Thanks, > Dehao > > On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com > <mailto:dehao at google.com>> wrote: > > Here is the code size impact for clang, chrome and 24 google > internal benchmarks (name omited, 14 15 16 are > encoding/decoding benchmarks similar as h264). There are 2 > columns, for threshold 300 and 450 respectively. > > I also tested the llvm test suite. Changing the threshold to > 300/450 does not affect code gen for any binary in the test suite. > > > > 300 450 > clang 0.30% 0.63% > chrome 0.00% 0.00% > 1 0.27% 0.67% > 2 0.44% 0.93% > 3 0.44% 0.93% > 4 0.26% 0.53% > 5 0.74% 2.21% > 6 0.74% 2.21% > 7 0.74% 2.21% > 8 0.46% 1.05% > 9 0.35% 0.86% > 10 0.35% 0.86% > 11 0.40% 0.83% > 12 0.32% 0.65% > 13 0.31% 0.64% > 14 4.52% 8.23% > 15 9.90% 19.38% > 16 9.90% 19.38% > 17 0.68% 1.97% > 18 0.21% 0.48% > 19 0.99% 3.44% > 20 0.19% 0.46% > 21 0.57% 1.62% > 22 0.37% 1.05% > 23 0.78% 1.30% > 24 0.51% 1.54% > > > On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via > llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> wrote: > >> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via >> llvm-dev <llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> clang, chrome, and some internal large apps are good >> candidates for size metrics. > I'd also add the standard LLVM testsuite just because it's > the suite everyone in the community can use. > > Michael >> >> David >> >> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via >> llvm-dev <llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> I had suggested having size metrics from somewhat >> larger applications such as Chrome, Webkit, or >> Firefox; clang itself; and maybe some of our internal >> binaries with rough size brackets? >> >> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen >> <dehao at google.com <mailto:dehao at google.com>> wrote: >> >> With the new data points, any comments on whether >> this can justify setting fully inline threshold >> to 300 (or any other number) in O2? I can collect >> more data points if it's helpful. >> >> Thanks, >> Dehao >> >> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen >> <dehao at google.com <mailto:dehao at google.com>> wrote: >> >> Recollected the data from trunk head with >> stddev data and more threshold data points >> attached: >> >> Performance: >> >> stddev/mean 300 450 600 750 >> 403 0.37% 0.11% 0.11% 0.09% 0.79% >> 433 0.14% 0.51% 0.25% -0.63% -0.29% >> 445 0.08% 0.48% 0.89% 0.12% 0.83% >> 447 0.16% 3.50% 2.69% 3.66% 3.59% >> 453 0.11% 1.49% 0.45% -0.07% 0.78% >> 464 0.17% 0.75% 1.80% 1.86% 1.54% >> >> >> Code size: >> >> 300 450 600 750 >> 403 0.56% 2.41% 2.74% 3.75% >> 433 0.96% 2.84% 4.19% 4.87% >> 445 2.16% 3.62% 4.48% 5.88% >> 447 2.96% 5.09% 6.74% 8.89% >> 453 0.94% 1.67% 2.73% 2.96% >> 464 8.02% 13.50% 20.51% 26.59% >> >> >> Compile time is proportional in the >> experiments and more noisy, so I did not >> include it. >> >> We have >2% speedup on some google internal >> benchmarks when switching the threshold from >> 150 to 300. >> >> Dehao >> >> On Mon, Jan 30, 2017 at 5:06 PM, Chandler >> Carruth <chandlerc at google.com >> <mailto:chandlerc at google.com>> wrote: >> >> On Mon, Jan 30, 2017 at 4:59 PM Mehdi >> Amini <mehdi.amini at apple.com >> <mailto:mehdi.amini at apple.com>> wrote: >> >>> >>> >>> Another question is about >>> PGO integration: is it >>> already hooked there? Should >>> we have a more aggressive >>> threshold in a hot function? >>> (Assuming we’re willing to >>> spend some binary size there >>> but not on the cold path). >>> >>> >>> I would even wire the >>> *unrolling* the other way: just >>> suppress unrolling in cold paths >>> to save binary size. rolled >>> loops seem like a generally good >>> thing in cold code unless they >>> are having some larger impact >>> (IE, the loop itself is more >>> expensive than the unrolled form). >>> >>> >>> >>> Agree that we could suppress >>> unrolling in cold path to save code >>> size. But that's orthogonal with the >>> propose here. This proposal focuses >>> on O2 performance: shall we have >>> different (higher) fully unroll >>> threshold than dynamic/partial unroll. >> >> I agree that this is (to some extent) >> orthogonal, and it makes sense to me >> to differentiate the threshold for >> full unroll and the dynamic/partial case. >> >> >> There is one issue that makes these not >> orthogonal. >> >> If even *static* profile hints will >> reduce some of the code size increase >> caused by higher unrolling thresholds for >> non-cold code, we should factor that into >> the tradeoff in picking where the >> threshold goes. >> >> However, getting PGO into the full >> unroller is currently challenging outside >> of the new pass manager. We already have >> some unfortunate hacks around this in >> LoopUnswitch that are making the port of >> it to the new PM more annoying. >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170208/0f7dd34d/attachment-0001.html>
Dehao Chen via llvm-dev
2017-Feb-10 23:21 UTC
[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
Thanks every for the comments. Do we have a decision here? Dehao On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote: > > Sorry if I missed it, but what machine/CPU are you using to collect the > perf numbers? > > I am concerned that what may be a win on a CPU that keeps a couple of > hundred instructions in-flight and has many MB of caches will not hold for > a small core. > > > In my experience, unrolling tends to help weaker cores even more than > stronger ones because it allows the instruction scheduler more > opportunities to hide latency. Obviously, instruction-cache pressure is an > important consideration, but the code size changes here seems small. > > > Is the proposed change universal? Is there a way to undo it? > > > All of the unrolling thresholds should be target-adjustable using the > TTI::getUnrollingPreferences hook. > > -Hal > > > > On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Ping... with the updated code size impact data, any more comments? Any >> more data that would be interesting to collect? >> >> Thanks, >> Dehao >> >> On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com> wrote: >> >>> Here is the code size impact for clang, chrome and 24 google internal >>> benchmarks (name omited, 14 15 16 are encoding/decoding benchmarks similar >>> as h264). There are 2 columns, for threshold 300 and 450 respectively. >>> >>> I also tested the llvm test suite. Changing the threshold to 300/450 >>> does not affect code gen for any binary in the test suite. >>> >>> >>> >>> 300 450 >>> clang 0.30% 0.63% >>> chrome 0.00% 0.00% >>> 1 0.27% 0.67% >>> 2 0.44% 0.93% >>> 3 0.44% 0.93% >>> 4 0.26% 0.53% >>> 5 0.74% 2.21% >>> 6 0.74% 2.21% >>> 7 0.74% 2.21% >>> 8 0.46% 1.05% >>> 9 0.35% 0.86% >>> 10 0.35% 0.86% >>> 11 0.40% 0.83% >>> 12 0.32% 0.65% >>> 13 0.31% 0.64% >>> 14 4.52% 8.23% >>> 15 9.90% 19.38% >>> 16 9.90% 19.38% >>> 17 0.68% 1.97% >>> 18 0.21% 0.48% >>> 19 0.99% 3.44% >>> 20 0.19% 0.46% >>> 21 0.57% 1.62% >>> 22 0.37% 1.05% >>> 23 0.78% 1.30% >>> 24 0.51% 1.54% >>> >>> On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>> clang, chrome, and some internal large apps are good candidates for >>>> size metrics. >>>> >>>> I'd also add the standard LLVM testsuite just because it's the suite >>>> everyone in the community can use. >>>> >>>> Michael >>>> >>>> >>>> David >>>> >>>> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> I had suggested having size metrics from somewhat larger applications >>>>> such as Chrome, Webkit, or Firefox; clang itself; and maybe some of our >>>>> internal binaries with rough size brackets? >>>>> >>>>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote: >>>>> >>>>>> With the new data points, any comments on whether this can justify >>>>>> setting fully inline threshold to 300 (or any other number) in O2? I can >>>>>> collect more data points if it's helpful. >>>>>> >>>>>> Thanks, >>>>>> Dehao >>>>>> >>>>>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen <dehao at google.com> wrote: >>>>>> >>>>>> Recollected the data from trunk head with stddev data and more >>>>>> threshold data points attached: >>>>>> >>>>>> Performance: >>>>>> >>>>>> stddev/mean 300 450 600 750 >>>>>> 403 0.37% 0.11% 0.11% 0.09% 0.79% >>>>>> 433 0.14% 0.51% 0.25% -0.63% -0.29% >>>>>> 445 0.08% 0.48% 0.89% 0.12% 0.83% >>>>>> 447 0.16% 3.50% 2.69% 3.66% 3.59% >>>>>> 453 0.11% 1.49% 0.45% -0.07% 0.78% >>>>>> 464 0.17% 0.75% 1.80% 1.86% 1.54% >>>>>> Code size: >>>>>> >>>>>> 300 450 600 750 >>>>>> 403 0.56% 2.41% 2.74% 3.75% >>>>>> 433 0.96% 2.84% 4.19% 4.87% >>>>>> 445 2.16% 3.62% 4.48% 5.88% >>>>>> 447 2.96% 5.09% 6.74% 8.89% >>>>>> 453 0.94% 1.67% 2.73% 2.96% >>>>>> 464 8.02% 13.50% 20.51% 26.59% >>>>>> Compile time is proportional in the experiments and more noisy, so I >>>>>> did not include it. >>>>>> >>>>>> We have >2% speedup on some google internal benchmarks when switching >>>>>> the threshold from 150 to 300. >>>>>> >>>>>> Dehao >>>>>> >>>>>> On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth < >>>>>> chandlerc at google.com> wrote: >>>>>> >>>>>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Another question is about PGO integration: is it already hooked >>>>>> there? Should we have a more aggressive threshold in a hot function? >>>>>> (Assuming we’re willing to spend some binary size there but not on the cold >>>>>> path). >>>>>> >>>>>> >>>>>> I would even wire the *unrolling* the other way: just suppress >>>>>> unrolling in cold paths to save binary size. rolled loops seem like a >>>>>> generally good thing in cold code unless they are having some larger impact >>>>>> (IE, the loop itself is more expensive than the unrolled form). >>>>>> >>>>>> >>>>>> >>>>>> Agree that we could suppress unrolling in cold path to save code >>>>>> size. But that's orthogonal with the propose here. This proposal focuses on >>>>>> O2 performance: shall we have different (higher) fully unroll threshold >>>>>> than dynamic/partial unroll. >>>>>> >>>>>> >>>>>> I agree that this is (to some extent) orthogonal, and it makes sense >>>>>> to me to differentiate the threshold for full unroll and the >>>>>> dynamic/partial case. >>>>>> >>>>>> >>>>>> There is one issue that makes these not orthogonal. >>>>>> >>>>>> If even *static* profile hints will reduce some of the code size >>>>>> increase caused by higher unrolling thresholds for non-cold code, we should >>>>>> factor that into the tradeoff in picking where the threshold goes. >>>>>> >>>>>> However, getting PGO into the full unroller is currently challenging >>>>>> outside of the new pass manager. We already have some unfortunate hacks >>>>>> around this in LoopUnswitch that are making the port of it to the new PM >>>>>> more annoying. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170210/4fbe2b72/attachment-0001.html>
Hal Finkel via llvm-dev
2017-Feb-10 23:23 UTC
[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
On 02/10/2017 05:21 PM, Dehao Chen wrote:> Thanks every for the comments. > > Do we have a decision here?You're good to go as far as I'm concerned. -Hal> > Dehao > > On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov > <mailto:hfinkel at anl.gov>> wrote: > > > On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote: >> Sorry if I missed it, but what machine/CPU are you using to >> collect the perf numbers? >> >> I am concerned that what may be a win on a CPU that keeps a >> couple of hundred instructions in-flight and has many MB of >> caches will not hold for a small core. > > In my experience, unrolling tends to help weaker cores even more > than stronger ones because it allows the instruction scheduler > more opportunities to hide latency. Obviously, instruction-cache > pressure is an important consideration, but the code size changes > here seems small. > >> >> Is the proposed change universal? Is there a way to undo it? > > All of the unrolling thresholds should be target-adjustable using > the TTI::getUnrollingPreferences hook. > > -Hal > > >> >> On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Ping... with the updated code size impact data, any more >> comments? Any more data that would be interesting to collect? >> >> Thanks, >> Dehao >> >> On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com >> <mailto:dehao at google.com>> wrote: >> >> Here is the code size impact for clang, chrome and 24 >> google internal benchmarks (name omited, 14 15 16 are >> encoding/decoding benchmarks similar as h264). There are >> 2 columns, for threshold 300 and 450 respectively. >> >> I also tested the llvm test suite. Changing the threshold >> to 300/450 does not affect code gen for any binary in the >> test suite. >> >> >> >> 300 450 >> clang 0.30% 0.63% >> chrome 0.00% 0.00% >> 1 0.27% 0.67% >> 2 0.44% 0.93% >> 3 0.44% 0.93% >> 4 0.26% 0.53% >> 5 0.74% 2.21% >> 6 0.74% 2.21% >> 7 0.74% 2.21% >> 8 0.46% 1.05% >> 9 0.35% 0.86% >> 10 0.35% 0.86% >> 11 0.40% 0.83% >> 12 0.32% 0.65% >> 13 0.31% 0.64% >> 14 4.52% 8.23% >> 15 9.90% 19.38% >> 16 9.90% 19.38% >> 17 0.68% 1.97% >> 18 0.21% 0.48% >> 19 0.99% 3.44% >> 20 0.19% 0.46% >> 21 0.57% 1.62% >> 22 0.37% 1.05% >> 23 0.78% 1.30% >> 24 0.51% 1.54% >> >> >> On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via >> llvm-dev <llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>> wrote: >> >>> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via >>> llvm-dev <llvm-dev at lists.llvm.org >>> <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> clang, chrome, and some internal large apps are good >>> candidates for size metrics. >> I'd also add the standard LLVM testsuite just because >> it's the suite everyone in the community can use. >> >> Michael >>> >>> David >>> >>> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via >>> llvm-dev <llvm-dev at lists.llvm.org >>> <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> I had suggested having size metrics from >>> somewhat larger applications such as Chrome, >>> Webkit, or Firefox; clang itself; and maybe some >>> of our internal binaries with rough size brackets? >>> >>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen >>> <dehao at google.com <mailto:dehao at google.com>> wrote: >>> >>> With the new data points, any comments on >>> whether this can justify setting fully >>> inline threshold to 300 (or any other >>> number) in O2? I can collect more data >>> points if it's helpful. >>> >>> Thanks, >>> Dehao >>> >>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen >>> <dehao at google.com <mailto:dehao at google.com>> >>> wrote: >>> >>> Recollected the data from trunk head >>> with stddev data and more threshold data >>> points attached: >>> >>> Performance: >>> >>> stddev/mean 300 450 600 750 >>> 403 0.37% 0.11% 0.11% 0.09% 0.79% >>> 433 0.14% 0.51% 0.25% -0.63% -0.29% >>> 445 0.08% 0.48% 0.89% 0.12% 0.83% >>> 447 0.16% 3.50% 2.69% 3.66% 3.59% >>> 453 0.11% 1.49% 0.45% -0.07% 0.78% >>> 464 0.17% 0.75% 1.80% 1.86% 1.54% >>> >>> >>> Code size: >>> >>> 300 450 600 750 >>> 403 0.56% 2.41% 2.74% 3.75% >>> 433 0.96% 2.84% 4.19% 4.87% >>> 445 2.16% 3.62% 4.48% 5.88% >>> 447 2.96% 5.09% 6.74% 8.89% >>> 453 0.94% 1.67% 2.73% 2.96% >>> 464 8.02% 13.50% 20.51% 26.59% >>> >>> >>> Compile time is proportional in the >>> experiments and more noisy, so I did not >>> include it. >>> >>> We have >2% speedup on some google >>> internal benchmarks when switching the >>> threshold from 150 to 300. >>> >>> Dehao >>> >>> On Mon, Jan 30, 2017 at 5:06 PM, >>> Chandler Carruth <chandlerc at google.com >>> <mailto:chandlerc at google.com>> wrote: >>> >>> On Mon, Jan 30, 2017 at 4:59 PM >>> Mehdi Amini <mehdi.amini at apple.com >>> <mailto:mehdi.amini at apple.com>> wrote: >>> >>>> >>>> >>>> Another question is >>>> about PGO integration: >>>> is it already hooked >>>> there? Should we have a >>>> more aggressive >>>> threshold in a hot >>>> function? (Assuming >>>> we’re willing to spend >>>> some binary size there >>>> but not on the cold path). >>>> >>>> >>>> I would even wire the >>>> *unrolling* the other way: >>>> just suppress unrolling in >>>> cold paths to save binary >>>> size. rolled loops seem >>>> like a generally good thing >>>> in cold code unless they >>>> are having some larger >>>> impact (IE, the loop itself >>>> is more expensive than the >>>> unrolled form). >>>> >>>> >>>> >>>> Agree that we could suppress >>>> unrolling in cold path to save >>>> code size. But that's >>>> orthogonal with the propose >>>> here. This proposal focuses on >>>> O2 performance: shall we have >>>> different (higher) fully unroll >>>> threshold than dynamic/partial >>>> unroll. >>> >>> I agree that this is (to some >>> extent) orthogonal, and it makes >>> sense to me to differentiate the >>> threshold for full unroll and >>> the dynamic/partial case. >>> >>> >>> There is one issue that makes these >>> not orthogonal. >>> >>> If even *static* profile hints will >>> reduce some of the code size >>> increase caused by higher unrolling >>> thresholds for non-cold code, we >>> should factor that into the tradeoff >>> in picking where the threshold goes. >>> >>> However, getting PGO into the full >>> unroller is currently challenging >>> outside of the new pass manager. We >>> already have some unfortunate hacks >>> around this in LoopUnswitch that are >>> making the port of it to the new PM >>> more annoying. >>> >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170210/63eed20d/attachment-0001.html>
Possibly Parallel Threads
- (RFC) Adjusting default loop fully unroll threshold
- (RFC) Adjusting default loop fully unroll threshold
- (RFC) Adjusting default loop fully unroll threshold
- (RFC) Adjusting default loop fully unroll threshold
- (RFC) Adjusting default loop fully unroll threshold