thr3ads.net - llvm dev - [llvm-dev] Status update on the hot/cold splitting pass [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Teresa Johnson via llvm-dev

2019-Feb-05 22:38 UTC

[llvm-dev] Status update on the hot/cold splitting pass

On Mon, Jan 28, 2019 at 11:03 AM Aditya K via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> > The splitting pass currently doesn’t move cold symbols into a separate
> section. Is that affecting your results?
> Maybe partly, the main reason is that, in the absence of good profile
> info, we aren't finding many cold blocks.
>
We noticed that the split cold functions are ending up in the regular .text
section instead of .text.unlikely. Since that is done much later than
splitting and is based on profile counts, it must be the case that profile
data is not being propagated to the split functions in some way - do you
know offhand if they are getting function_entry_count prof metadata?

The other thing we noticed is that the .text.unlikely section is also
reducing significantly, so it seems like some of the already cold blocks
are getting split - has anyone noticed that?

Teresa

> -Aditya
>
> ------------------------------
> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar
<
> vedant_kumar at apple.com>
> *Sent:* Monday, January 28, 2019 1:00 PM
> *To:* Aditya K
> *Cc:* llvm-dev at lists.llvm.org; Sebastian Pop
> *Subject:* Re: [llvm-dev] Status update on the hot/cold splitting pass
>
> The splitting pass currently doesn’t move cold symbols into a separate
> section. Is that affecting your results?
>
> On Darwin, we plan on using a symbol attribute to provide an ordering hint
> to the linker (see r352227, N_COLD_FUNC).
>
> vedant
>
> On Jan 28, 2019, at 10:51 AM, Aditya K via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Very happy to see good results. On our side, we are still struggling with
> getting a good profile to get aggressive hot-cold splitting. Static profile
> isn't helping much for our use cases. I'll be curious to know if
someone
> got good improvements only with static profile analysis.
>
>
> -Aditya
>
> ------------------------------
> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant Kumar
<
> vedant_kumar at apple.com>
> *Sent:* Friday, January 25, 2019 6:29 PM
> *To:* llvm-dev at lists.llvm.org
> *Cc:* Aditya Kumar; Sebastian Pop; Teresa Johnson; jun.l at samsung.com;
> Duncan Smith; Gerolf Hoflehner
> *Subject:* Status update on the hot/cold splitting pass
>
> Hello,
>
> I’d like to give a status update to the community about the recently-added
> hot/cold splitting pass. I'll provide some motivation for the pass,
> describe its implementation, summarize recent/ongoing work, and share early
> results.
>
> # Motivation
>
> We (at Apple) have found that memory pressure from resident pages of code
> is significant on embedded devices. In particular, this pressure spikes
> during app launches. We’ve been looking into ways to reduce memory
> pressure. Hot/cold splitting is one part of a solution.
>
> # What does hot/cold splitting do?
>
> The hot/cold splitting pass identifies cold basic blocks and moves them
> into separate functions. The linker must order newly-created cold functions
> away from the rest of the program (say, into a cold section). The idea here
> is to have these cold pages faulted in relatively infrequently (if at all),
> and to improve the memory locality of code outside of the cold area.
>
> The pass considers profile data, traps, uses of the `cold*`* attribute,
> and exception-handling code to identify cold blocks. If the pass identifies
> a cold region that's profitable to extract, it uses LLVM's
CodeExtractor
> utility to split the region out of its original function. Newly-created
> cold functions are marked `minsize` (-Oz). The splitting process may occur
> multiple times per function.
>
> The choice to perform splitting at the IR level gave us a lot of
> flexibility. It allowed us to quickly target different architectures and
> evaluate new phase orderings. It also made it easier to split out highly
> complex subgraphs of CFGs (with both live-ins and live-outs). One
> disadvantage is that we cannot easily split out EH pads (llvm.org/PR39545).
> However, our experiments show that doing so only increases the total amount
> of split code by 2% across the entire iOS shared cache.
>
> # Recent/ongoing work
>
> Aditya and Sebastian contributed the hot/cold splitting pass in September
> 2018 (r341669). Since then, work on the pass has continued steadily. It
> gained the ability to extract larger cold regions (r345209), compile-time
> improvements (r351892, r351894), and a more effective cost model (r352228).
> With some experimentation, we found that scheduling splitting before
> inlining gives better code size results without regressing memory locality
> (r352080). Along the way, CodeExtractor got better at handling debug info
> (r344545, r346255), and a few other issues in this utility were fixed
> (r348205, r350420).
>
> At this point, we're able to build & run our software stack with
hot/cold
> splitting enabled. We’d like to introduce a CC1 option to safely toggle
> splitting on/off (https://reviews.llvm.org/D57265). That would help
> experiment with and/or deploy the pass.
>
> # Early results
>
> On internal memory benchmarks, we consistently saw that code page faults
> were more concentrated with splitting enabled. With splitting, the set of
> the most-frequently-accessed 95% (99%) of code pages was 10% (resp. 3.6%)
> smaller. We used a facility in the xnu VM to force pages to be faulted
> periodically, and ktrace, to collect this data. We settled on this approach
> because the alternatives (e.g. directly sampling RSS of various processes)
> gave unstable results, even when measures were taken to stabilize a device
> (e.g. disabling dynamic frequency switching, SMP, and various other
> features).
>
> On arm64, the performance impact of enabling splitting in the LLVM test
> suite appears to be in the noise. We think this is because split code
> amount to just 0.1% of all the code in the test suite. Across the iOS
> shared cache we see that 0.9% of code is split, with higher percentages in
> key frameworks (e.g. 7% in libdispatch). For three internal benchmarks, we
> see geomean score improvements of 1.58%, 0.56%, and 0.27% respectively. We
> think these results are promising. I’d like to encourage others to evaluate
> the pass and share results.
>
> Thanks!
>
> vedant
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190205/e1ee491b/attachment.html>

Vedant Kumar via llvm-dev

2019-Feb-05 23:45 UTC

head link

[llvm-dev] Status update on the hot/cold splitting pass

Hi Teresa,
> On Feb 5, 2019, at 2:38 PM, Teresa Johnson via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> On Mon, Jan 28, 2019 at 11:03 AM Aditya K via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> > The splitting pass currently doesn’t move cold symbols into a separate
section. Is that affecting your results?
> Maybe partly, the main reason is that, in the absence of good profile info,
we aren't finding many cold blocks.
> 
> We noticed that the split cold functions are ending up in the regular .text
section instead of .text.unlikely. Since that is done much later than splitting
and is based on profile counts, it must be the case that profile data is not
being propagated to the split functions in some way - do you know offhand if
they are getting function_entry_count prof metadata?
At the moment, entry counts are not propagated to the split functions. This
should explain the behavior you see.

> The other thing we noticed is that the .text.unlikely section is also
reducing significantly, so it seems like some of the already cold blocks are
getting split - has anyone noticed that?
No, but we’ve focused on marking up select commonly-used APIs cold explicitly.
The splitting pass skips functions where PSI->isFunctionEntryCold() holds —
maybe a stronger check is necessary?

vedant
> 
> Teresa
> 
> 
> -Aditya
> 
> From: vsk at apple.com <mailto:vsk at apple.com> <vsk at apple.com
<mailto:vsk at apple.com>> on behalf of Vedant Kumar <vedant_kumar
at apple.com <mailto:vedant_kumar at apple.com>>
> Sent: Monday, January 28, 2019 1:00 PM
> To: Aditya K
> Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>;
Sebastian Pop
> Subject: Re: [llvm-dev] Status update on the hot/cold splitting pass
>  
> The splitting pass currently doesn’t move cold symbols into a separate
section. Is that affecting your results?
> 
> On Darwin, we plan on using a symbol attribute to provide an ordering hint
to the linker (see r352227, N_COLD_FUNC).
> 
> vedant
> 
>> On Jan 28, 2019, at 10:51 AM, Aditya K via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Very happy to see good results. On our side, we are still struggling
with getting a good profile to get aggressive hot-cold splitting. Static profile
isn't helping much for our use cases. I'll be curious to know if someone
got good improvements only with static profile analysis.
>> 
>> 
>> -Aditya
>> 
>> From: vsk at apple.com <mailto:vsk at apple.com> <vsk at
apple.com <mailto:vsk at apple.com>> on behalf of Vedant Kumar
<vedant_kumar at apple.com <mailto:vedant_kumar at apple.com>>
>> Sent: Friday, January 25, 2019 6:29 PM
>> To: llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>> Cc: Aditya Kumar; Sebastian Pop; Teresa Johnson; jun.l at samsung.com
<mailto:jun.l at samsung.com>; Duncan Smith; Gerolf Hoflehner
>> Subject: Status update on the hot/cold splitting pass
>>  
>> Hello,
>> 
>> I’d like to give a status update to the community about the
recently-added hot/cold splitting pass. I'll provide some motivation for the
pass, describe its implementation, summarize recent/ongoing work, and share
early results.
>> 
>> # Motivation
>> 
>> We (at Apple) have found that memory pressure from resident pages of
code is significant on embedded devices. In particular, this pressure spikes
during app launches. We’ve been looking into ways to reduce memory pressure.
Hot/cold splitting is one part of a solution.
>> 
>> # What does hot/cold splitting do?
>> 
>> The hot/cold splitting pass identifies cold basic blocks and moves them
into separate functions. The linker must order newly-created cold functions away
from the rest of the program (say, into a cold section). The idea here is to
have these cold pages faulted in relatively infrequently (if at all), and to
improve the memory locality of code outside of the cold area.
>> 
>> The pass considers profile data, traps, uses of the `cold` attribute,
and exception-handling code to identify cold blocks. If the pass identifies a
cold region that's profitable to extract, it uses LLVM's CodeExtractor
utility to split the region out of its original function. Newly-created cold
functions are marked `minsize` (-Oz). The splitting process may occur multiple
times per function.
>> 
>> The choice to perform splitting at the IR level gave us a lot of
flexibility. It allowed us to quickly target different architectures and
evaluate new phase orderings. It also made it easier to split out highly complex
subgraphs of CFGs (with both live-ins and live-outs). One disadvantage is that
we cannot easily split out EH pads (llvm.org/PR39545
<http://llvm.org/PR39545>). However, our experiments show that doing so
only increases the total amount of split code by 2% across the entire iOS shared
cache.
>> 
>> # Recent/ongoing work
>> 
>> Aditya and Sebastian contributed the hot/cold splitting pass in
September 2018 (r341669). Since then, work on the pass has continued steadily.
It gained the ability to extract larger cold regions (r345209), compile-time
improvements (r351892, r351894), and a more effective cost model (r352228). With
some experimentation, we found that scheduling splitting before inlining gives
better code size results without regressing memory locality (r352080). Along the
way, CodeExtractor got better at handling debug info (r344545, r346255), and a
few other issues in this utility were fixed (r348205, r350420).
>> 
>> At this point, we're able to build & run our software stack
with hot/cold splitting enabled. We’d like to introduce a CC1 option to safely
toggle splitting on/off (https://reviews.llvm.org/D57265
<https://reviews.llvm.org/D57265>). That would help experiment with and/or
deploy the pass.
>> 
>> # Early results
>> 
>> On internal memory benchmarks, we consistently saw that code page
faults were more concentrated with splitting enabled. With splitting, the set of
the most-frequently-accessed 95% (99%) of code pages was 10% (resp. 3.6%)
smaller. We used a facility in the xnu VM to force pages to be faulted
periodically, and ktrace, to collect this data. We settled on this approach
because the alternatives (e.g. directly sampling RSS of various processes) gave
unstable results, even when measures were taken to stabilize a device (e.g.
disabling dynamic frequency switching, SMP, and various other features).
>> 
>> On arm64, the performance impact of enabling splitting in the LLVM test
suite appears to be in the noise. We think this is because split code amount to
just 0.1% of all the code in the test suite. Across the iOS shared cache we see
that 0.9% of code is split, with higher percentages in key frameworks (e.g. 7%
in libdispatch). For three internal benchmarks, we see geomean score
improvements of 1.58%, 0.56%, and 0.27% respectively. We think these results are
promising. I’d like to encourage others to evaluate the pass and share results.
>> 
>> Thanks!
>> 
>> vedant
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> -- 
> Teresa Johnson |	 Software Engineer |	 tejohnson at google.com
<mailto:tejohnson at google.com> |
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190205/0a0ee451/attachment-0001.html>

Teresa Johnson via llvm-dev

2019-Feb-05 23:56 UTC

head link

[llvm-dev] Status update on the hot/cold splitting pass

On Tue, Feb 5, 2019, 3:46 PM Vedant Kumar <vedant_kumar at apple.com>
wrote:
> Hi Teresa,
>
> On Feb 5, 2019, at 2:38 PM, Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
> On Mon, Jan 28, 2019 at 11:03 AM Aditya K via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> > The splitting pass currently doesn’t move cold symbols into a
separate
>> section. Is that affecting your results?
>> Maybe partly, the main reason is that, in the absence of good profile
>> info, we aren't finding many cold blocks.
>>
>
> We noticed that the split cold functions are ending up in the regular
> .text section instead of .text.unlikely. Since that is done much later than
> splitting and is based on profile counts, it must be the case that profile
> data is not being propagated to the split functions in some way - do you
> know offhand if they are getting function_entry_count prof metadata?
>
>
> At the moment, entry counts are not propagated to the split functions.
> This should explain the behavior you see.
>
Ok, it should be straightforward to add that, will take a look.
>
>
> The other thing we noticed is that the .text.unlikely section is also
> reducing significantly, so it seems like some of the already cold blocks
> are getting split - has anyone noticed that?
>
>
> No, but we’ve focused on marking up select commonly-used APIs cold
> explicitly. The splitting pass skips functions where
> PSI->isFunctionEntryCold() holds — maybe a stronger check is necessary?
>
Yeah I'm not sure. The cold section assignment uses a slightly different
PSI interface, isFunctionColdInCallGraph, but that shouldn't be very
different. I'll need to take a closer look later and get back.

Thanks,
Teresa
>
> vedant
>
>
> Teresa
>
>
>> -Aditya
>>
>> ------------------------------
>> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant
Kumar <
>> vedant_kumar at apple.com>
>> *Sent:* Monday, January 28, 2019 1:00 PM
>> *To:* Aditya K
>> *Cc:* llvm-dev at lists.llvm.org; Sebastian Pop
>> *Subject:* Re: [llvm-dev] Status update on the hot/cold splitting pass
>>
>> The splitting pass currently doesn’t move cold symbols into a separate
>> section. Is that affecting your results?
>>
>> On Darwin, we plan on using a symbol attribute to provide an ordering
>> hint to the linker (see r352227, N_COLD_FUNC).
>>
>> vedant
>>
>> On Jan 28, 2019, at 10:51 AM, Aditya K via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Very happy to see good results. On our side, we are still struggling
with
>> getting a good profile to get aggressive hot-cold splitting. Static
profile
>> isn't helping much for our use cases. I'll be curious to know
if someone
>> got good improvements only with static profile analysis.
>>
>>
>> -Aditya
>>
>> ------------------------------
>> *From:* vsk at apple.com <vsk at apple.com> on behalf of Vedant
Kumar <
>> vedant_kumar at apple.com>
>> *Sent:* Friday, January 25, 2019 6:29 PM
>> *To:* llvm-dev at lists.llvm.org
>> *Cc:* Aditya Kumar; Sebastian Pop; Teresa Johnson; jun.l at
samsung.com;
>> Duncan Smith; Gerolf Hoflehner
>> *Subject:* Status update on the hot/cold splitting pass
>>
>> Hello,
>>
>> I’d like to give a status update to the community about the
>> recently-added hot/cold splitting pass. I'll provide some
motivation for
>> the pass, describe its implementation, summarize recent/ongoing work,
and
>> share early results.
>>
>> # Motivation
>>
>> We (at Apple) have found that memory pressure from resident pages of
code
>> is significant on embedded devices. In particular, this pressure spikes
>> during app launches. We’ve been looking into ways to reduce memory
>> pressure. Hot/cold splitting is one part of a solution.
>>
>> # What does hot/cold splitting do?
>>
>> The hot/cold splitting pass identifies cold basic blocks and moves them
>> into separate functions. The linker must order newly-created cold
functions
>> away from the rest of the program (say, into a cold section). The idea
here
>> is to have these cold pages faulted in relatively infrequently (if at
all),
>> and to improve the memory locality of code outside of the cold area.
>>
>> The pass considers profile data, traps, uses of the `cold*`* attribute,
>> and exception-handling code to identify cold blocks. If the pass
identifies
>> a cold region that's profitable to extract, it uses LLVM's
CodeExtractor
>> utility to split the region out of its original function. Newly-created
>> cold functions are marked `minsize` (-Oz). The splitting process may
occur
>> multiple times per function.
>>
>> The choice to perform splitting at the IR level gave us a lot of
>> flexibility. It allowed us to quickly target different architectures
and
>> evaluate new phase orderings. It also made it easier to split out
highly
>> complex subgraphs of CFGs (with both live-ins and live-outs). One
>> disadvantage is that we cannot easily split out EH pads
(llvm.org/PR39545).
>> However, our experiments show that doing so only increases the total
amount
>> of split code by 2% across the entire iOS shared cache.
>>
>> # Recent/ongoing work
>>
>> Aditya and Sebastian contributed the hot/cold splitting pass in
September
>> 2018 (r341669). Since then, work on the pass has continued steadily. It
>> gained the ability to extract larger cold regions (r345209),
compile-time
>> improvements (r351892, r351894), and a more effective cost model
(r352228).
>> With some experimentation, we found that scheduling splitting before
>> inlining gives better code size results without regressing memory
locality
>> (r352080). Along the way, CodeExtractor got better at handling debug
info
>> (r344545, r346255), and a few other issues in this utility were fixed
>> (r348205, r350420).
>>
>> At this point, we're able to build & run our software stack
with hot/cold
>> splitting enabled. We’d like to introduce a CC1 option to safely toggle
>> splitting on/off (https://reviews.llvm.org/D57265). That would help
>> experiment with and/or deploy the pass.
>>
>> # Early results
>>
>> On internal memory benchmarks, we consistently saw that code page
faults
>> were more concentrated with splitting enabled. With splitting, the set
of
>> the most-frequently-accessed 95% (99%) of code pages was 10% (resp.
3.6%)
>> smaller. We used a facility in the xnu VM to force pages to be faulted
>> periodically, and ktrace, to collect this data. We settled on this
approach
>> because the alternatives (e.g. directly sampling RSS of various
processes)
>> gave unstable results, even when measures were taken to stabilize a
device
>> (e.g. disabling dynamic frequency switching, SMP, and various other
>> features).
>>
>> On arm64, the performance impact of enabling splitting in the LLVM test
>> suite appears to be in the noise. We think this is because split code
>> amount to just 0.1% of all the code in the test suite. Across the iOS
>> shared cache we see that 0.9% of code is split, with higher percentages
in
>> key frameworks (e.g. 7% in libdispatch). For three internal benchmarks,
we
>> see geomean score improvements of 1.58%, 0.56%, and 0.27% respectively.
We
>> think these results are promising. I’d like to encourage others to
evaluate
>> the pass and share results.
>>
>> Thanks!
>>
>> vedant
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190205/1c40cca0/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Feb 2019 - Status update on the hot/cold splitting pass

[llvm-dev] Status update on the hot/cold splitting pass

[llvm-dev] Status update on the hot/cold splitting pass

[llvm-dev] Status update on the hot/cold splitting pass

Maybe Matching Threads