thr3ads.net - llvm dev - [LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density) [May 2015]

If this information is useful, please help other people find it:
Share via:

Lee Hunt

2015-May-27 17:11 UTC

[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)

Thanks! CIL [LeeHu] for a few comments…

From: Xinliang David Li [mailto:xinliangli at gmail.com]
Sent: Wednesday, May 27, 2015 9:29 AM
To: Lee Hunt
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code
density)

On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at
exchange.microsoft.com<mailto:leehu at exchange.microsoft.com>> wrote:
Hello –

I’m an Engineer in Microsoft Office after looking into possible advantages of
using PGO for our Android Applications.

We at Microsoft have deep experience with Visual C++’s Profile Guided
Optimization<https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_e7k32f4k.aspx&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=L5s90Jkxqk45FMvD7qA0Visu71cC_bqMyLK3h0RSZtU&e=>
and often see 10% or more reduction in the size of application code loaded after
using PGO for key scenarios (e.g. application launch).

yes. This is true for the GCC too.  Clang's PGO does not shrink code size
yet.

[LeeHu] Note: I’m not talking about shrinking code size, but rather reordering
it such that only ‘active’ branches within the profiled functions are grouped
together in ‘hot’ code pages.  This is a very big optimization for us in VC++
toolchain in PGO.
We also have the “/LTCG” flag – which is seemingly similar to the “-flto” Clang
flag -- that *does* shrink code by various means (dead code removal, common IL
tree collapsing) because it can see all the object code for an entire produced
target binary (e.g. .exe or .dll).
Does -flto also shrink code?

 Making application launch quickly is very important to us, and reducing the
number of code pages loaded helps with this goal.

Before we dig into turning it on, I’m wondering if there’s any pre-existing
research / case studies about possible code page reduction seen from other Clang
PGO-enabled applications?  It sounds like there is some possible instrumented
run performance problems due to counter contention resulting in sluggish
performance and perhaps skewed profile data:
https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY<https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_forum_-23-21topic_llvm-2Ddev_cDqYgnxNEhY&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=YaUiiOgIrmA6Io5p4aWzmppYDAKyp8ddTwozd_l-Wjg&e=>.

Counter contention is one issue. Redundant counter updates is another major
issue (due to the early instrumentation). We are working on the later and see
great speed ups.

I’d like an overview of the optimizations that PGO does, but I don’t find much
from looking at the Clang PGO section:
http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization<https://urldefense.proofpoint.com/v2/url?u=http-3A__clang.llvm.org_docs_UsersManual.html-23profile-2Dguided-2Doptimization&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=cKiMsZqz31mbPqwGaH_hX2B8sTtFSJ65A4_vbF-fkB4&e=>.

Profile data is not used in any IPA passes yet. It is used by any post inline
optimizations though -- including block layout, register allocator etc.

[LeeHu]: sorry for naïve question, but what is IPA?  And what post-inline
optimizations are currently being done?   We’re currently using Clang 3.5 if
that matters.

For example, from reading different pages on how Clang PGO, it’s unclear if it
does “block reordering” (i.e. moving unexecuted code blocks to a distant code
page, leaving only ‘hot’ executed code packed together for greater code
density).

LLVM's block placement uses branch probability and frequency data, but there
is no function splitting optimization yet.

 I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m unclear if
this is the same thing.  Does Clang PGO do block reordering?

It does reordering, but does not do splitting/partitioning.

David

Thanks,
--Lee

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150527/f8cbc729/attachment.html>

Xinliang David Li

2015-May-27 17:21 UTC

head link

[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)

On Wed, May 27, 2015 at 10:11 AM, Lee Hunt <leehu at
exchange.microsoft.com>
wrote:
>  Thanks! CIL [LeeHu] for a few comments…
>
>
>
>
>
> *From:* Xinliang David Li [mailto:xinliangli at gmail.com]
> *Sent:* Wednesday, May 27, 2015 9:29 AM
> *To:* Lee Hunt
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving
code
> density)
>
>
>
>
>
> On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at
exchange.microsoft.com>
> wrote:
>
>  Hello –
>
>
>
> I’m an Engineer in Microsoft Office after looking into possible advantages
> of using PGO for our Android Applications.
>
>
>
> We at Microsoft have deep experience with Visual C++’s Profile Guided
> Optimization
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_e7k32f4k.aspx&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=L5s90Jkxqk45FMvD7qA0Visu71cC_bqMyLK3h0RSZtU&e=>
> and often see 10% or more reduction in the size of application code loaded
> after using PGO for key scenarios (e.g. application launch).
>
>
>
> yes. This is true for the GCC too.  Clang's PGO does not shrink code
size
> yet.
>
>
>
> [LeeHu] Note: I’m not talking about shrinking code size, but rather
> reordering it such that only ‘active’ branches within the profiled
> functions are grouped together in ‘hot’ code pages.  This is a very big
> optimization for us in VC++ toolchain in PGO.
>
> We also have the “/LTCG” flag – which is seemingly similar to the “-flto”
> Clang flag -- that **does** shrink code by various means (dead code
> removal, common IL tree collapsing) because it can see all the object code
> for an entire produced target binary (e.g. .exe or .dll).
>
> Does -flto also shrink code?
>
>
>
That depends on other options used (e.g, -Os). With LTO, compiler  sees
larger scope, performs cross module inlines and dead function eliminations.
It does have more opportunities to shrink code.



>     Making application launch quickly is very important to us, and
> reducing the number of code pages loaded helps with this goal.
>
>
>
> Before we dig into turning it on, I’m wondering if there’s any
> pre-existing research / case studies about possible code page reduction
> seen from other Clang PGO-enabled applications?  It sounds like there is
> some possible instrumented run performance problems due to counter
> contention resulting in sluggish performance and perhaps skewed profile
> data: https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_forum_-23-21topic_llvm-2Ddev_cDqYgnxNEhY&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=YaUiiOgIrmA6Io5p4aWzmppYDAKyp8ddTwozd_l-Wjg&e=>.
>
>
>
>
> Counter contention is one issue. Redundant counter updates is another
> major issue (due to the early instrumentation). We are working on the later
> and see great speed ups.
>
>
>
>
>
>  I’d like an overview of the optimizations that PGO does, but I don’t
> find much from looking at the Clang PGO section:
> http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__clang.llvm.org_docs_UsersManual.html-23profile-2Dguided-2Doptimization&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=cKiMsZqz31mbPqwGaH_hX2B8sTtFSJ65A4_vbF-fkB4&e=>
> .
>
>
>
> Profile data is not used in any IPA passes yet. It is used by any post
> inline optimizations though -- including block layout, register allocator
> etc.
>
>
>
> [LeeHu]: sorry for naïve question, but what is IPA?
>

Inter-procedural analysis/optimizations.


> And what post-inline optimizations are currently being done?   We’re
> currently using Clang 3.5 if that matters.
>
>
>
>
>
> For example, from reading different pages on how Clang PGO, it’s unclear
> if it does “block reordering” (i.e. moving unexecuted code blocks to a
> distant code page, leaving only ‘hot’ executed code packed together for
> greater code density).
>
>
>
> LLVM's block placement uses branch probability and frequency data, but
> there is no function splitting optimization yet.
>
>
>
>   I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m
> unclear if this is the same thing.  Does Clang PGO do block reordering?
>
>
>
> It does reordering, but does not do splitting/partitioning.
>
>
>
> David
>
>
>
>
>
>
>
> Thanks,
>
> --Lee
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150527/4a6cde53/attachment.html>

Randy Chapman

2015-May-27 19:40 UTC

head link

[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)

Hi David!

Thanks again for your help!  I was wondering if you could clarify one thing for
me?
 I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m unclear if
this is the same thing.  Does Clang PGO do block reordering?
It does reordering, but does not do splitting/partitioning.
I take this to mean that PGO does block reordering within the function?  I don’t
see that the clang drive passes anything to the linker to drive function
ordering at the linker level as well.  Is there something there that I missed,
or are you aware of any readily available tools to do so?  If not, we’ve done
some work locally on enabling that which we will continue.

Thanks ☺
--randy

From: Xinliang David Li [mailto:xinliangli at gmail.com]
Sent: Wednesday, May 27, 2015 10:21 AM
To: Lee Hunt
Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code
density)



On Wed, May 27, 2015 at 10:11 AM, Lee Hunt <leehu at
exchange.microsoft.com<mailto:leehu at exchange.microsoft.com>> wrote:
Thanks! CIL [LeeHu] for a few comments…


From: Xinliang David Li [mailto:xinliangli at gmail.com<mailto:xinliangli at
gmail.com>]
Sent: Wednesday, May 27, 2015 9:29 AM
To: Lee Hunt
Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code
density)


On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at
exchange.microsoft.com<mailto:leehu at exchange.microsoft.com>> wrote:
Hello –

I’m an Engineer in Microsoft Office after looking into possible advantages of
using PGO for our Android Applications.

We at Microsoft have deep experience with Visual C++’s Profile Guided
Optimization<https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_e7k32f4k.aspx&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=L5s90Jkxqk45FMvD7qA0Visu71cC_bqMyLK3h0RSZtU&e=>
and often see 10% or more reduction in the size of application code loaded after
using PGO for key scenarios (e.g. application launch).

yes. This is true for the GCC too.  Clang's PGO does not shrink code size
yet.

[LeeHu] Note: I’m not talking about shrinking code size, but rather reordering
it such that only ‘active’ branches within the profiled functions are grouped
together in ‘hot’ code pages.  This is a very big optimization for us in VC++
toolchain in PGO.
We also have the “/LTCG” flag – which is seemingly similar to the “-flto” Clang
flag -- that *does* shrink code by various means (dead code removal, common IL
tree collapsing) because it can see all the object code for an entire produced
target binary (e.g. .exe or .dll).
Does -flto also shrink code?


That depends on other options used (e.g, -Os). With LTO, compiler  sees larger
scope, performs cross module inlines and dead function eliminations. It does
have more opportunities to shrink code.



 Making application launch quickly is very important to us, and reducing the
number of code pages loaded helps with this goal.

Before we dig into turning it on, I’m wondering if there’s any pre-existing
research / case studies about possible code page reduction seen from other Clang
PGO-enabled applications?  It sounds like there is some possible instrumented
run performance problems due to counter contention resulting in sluggish
performance and perhaps skewed profile data:
https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY<https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_forum_-23-21topic_llvm-2Ddev_cDqYgnxNEhY&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=YaUiiOgIrmA6Io5p4aWzmppYDAKyp8ddTwozd_l-Wjg&e=>.

Counter contention is one issue. Redundant counter updates is another major
issue (due to the early instrumentation). We are working on the later and see
great speed ups.


I’d like an overview of the optimizations that PGO does, but I don’t find much
from looking at the Clang PGO section:
http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization<https://urldefense.proofpoint.com/v2/url?u=http-3A__clang.llvm.org_docs_UsersManual.html-23profile-2Dguided-2Doptimization&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=cKiMsZqz31mbPqwGaH_hX2B8sTtFSJ65A4_vbF-fkB4&e=>.

Profile data is not used in any IPA passes yet. It is used by any post inline
optimizations though -- including block layout, register allocator etc.

[LeeHu]: sorry for naïve question, but what is IPA?


Inter-procedural analysis/optimizations.


And what post-inline optimizations are currently being done?   We’re currently
using Clang 3.5 if that matters.


For example, from reading different pages on how Clang PGO, it’s unclear if it
does “block reordering” (i.e. moving unexecuted code blocks to a distant code
page, leaving only ‘hot’ executed code packed together for greater code
density).

LLVM's block placement uses branch probability and frequency data, but there
is no function splitting optimization yet.

 I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m unclear if
this is the same thing.  Does Clang PGO do block reordering?

It does reordering, but does not do splitting/partitioning.

David



Thanks,
--Lee

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150527/1ea3f8b1/attachment.html>

Xinliang David Li

2015-May-27 19:52 UTC

head link

[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)

On Wed, May 27, 2015 at 12:40 PM, Randy Chapman <randyc at microsoft.com>
wrote:
>
>
> Hi David!
>
>
>
> Thanks again for your help!  I was wondering if you could clarify one
> thing for me?
>
>  I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m
> unclear if this is the same thing.  Does Clang PGO do block reordering?
>
> It does reordering, but does not do splitting/partitioning.
>
> I take this to mean that PGO does block reordering within the function?  I
> don’t see that the clang drive passes anything to the linker to drive
> function ordering at the linker level as well.  Is there something there
> that I missed, or are you aware of any readily available tools to do so?
> If not, we’ve done some work locally on enabling that which we will
> continue.
>
>
>
Ok. There are three reordering related optimizations:

1) intra-procedural Basic Block Reordering to reduce branch cost, icache
miss and front-end stalls.
2) function splitting/partitioning -- splitting really code part of a
function into unlikely.text sections
3) function reordering based on affinity and hotness -- reordering
functions by the linker/plugin (guided by the compiler annotations).

Clang currently only does 1).

Hope this clarifies.

thanks,

David



>  Thanks J
>
> --randy
>
>
>
> *From:* Xinliang David Li [mailto:xinliangli at gmail.com
> <xinliangli at gmail.com>]
> *Sent:* Wednesday, May 27, 2015 10:21 AM
>
> *To:* Lee Hunt
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving
code
> density)
>
>
>
>
>
>
>
> On Wed, May 27, 2015 at 10:11 AM, Lee Hunt <leehu at
exchange.microsoft.com>
> wrote:
>
>  Thanks! CIL [LeeHu] for a few comments…
>
>
>
>
>
> *From:* Xinliang David Li [mailto:xinliangli at gmail.com]
> *Sent:* Wednesday, May 27, 2015 9:29 AM
> *To:* Lee Hunt
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving
code
> density)
>
>
>
>
>
> On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at
exchange.microsoft.com>
> wrote:
>
>  Hello –
>
>
>
> I’m an Engineer in Microsoft Office after looking into possible advantages
> of using PGO for our Android Applications.
>
>
>
> We at Microsoft have deep experience with Visual C++’s Profile Guided
> Optimization
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_e7k32f4k.aspx&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=L5s90Jkxqk45FMvD7qA0Visu71cC_bqMyLK3h0RSZtU&e=>
> and often see 10% or more reduction in the size of application code loaded
> after using PGO for key scenarios (e.g. application launch).
>
>
>
> yes. This is true for the GCC too.  Clang's PGO does not shrink code
size
> yet.
>
>
>
> [LeeHu] Note: I’m not talking about shrinking code size, but rather
> reordering it such that only ‘active’ branches within the profiled
> functions are grouped together in ‘hot’ code pages.  This is a very big
> optimization for us in VC++ toolchain in PGO.
>
> We also have the “/LTCG” flag – which is seemingly similar to the “-flto”
> Clang flag -- that **does** shrink code by various means (dead code
> removal, common IL tree collapsing) because it can see all the object code
> for an entire produced target binary (e.g. .exe or .dll).
>
> Does -flto also shrink code?
>
>
>
>
>
> That depends on other options used (e.g, -Os). With LTO, compiler  sees
> larger scope, performs cross module inlines and dead function eliminations.
> It does have more opportunities to shrink code.
>
>
>
>
>
>
>
>      Making application launch quickly is very important to us, and
> reducing the number of code pages loaded helps with this goal.
>
>
>
> Before we dig into turning it on, I’m wondering if there’s any
> pre-existing research / case studies about possible code page reduction
> seen from other Clang PGO-enabled applications?  It sounds like there is
> some possible instrumented run performance problems due to counter
> contention resulting in sluggish performance and perhaps skewed profile
> data: https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_forum_-23-21topic_llvm-2Ddev_cDqYgnxNEhY&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=YaUiiOgIrmA6Io5p4aWzmppYDAKyp8ddTwozd_l-Wjg&e=>.
>
>
>
>
> Counter contention is one issue. Redundant counter updates is another
> major issue (due to the early instrumentation). We are working on the later
> and see great speed ups.
>
>
>
>
>
>  I’d like an overview of the optimizations that PGO does, but I don’t
> find much from looking at the Clang PGO section:
> http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__clang.llvm.org_docs_UsersManual.html-23profile-2Dguided-2Doptimization&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=cKiMsZqz31mbPqwGaH_hX2B8sTtFSJ65A4_vbF-fkB4&e=>
> .
>
>
>
> Profile data is not used in any IPA passes yet. It is used by any post
> inline optimizations though -- including block layout, register allocator
> etc.
>
>
>
> [LeeHu]: sorry for naïve question, but what is IPA?
>
>
>
>
>
> Inter-procedural analysis/optimizations.
>
>
>
>
>
>    And what post-inline optimizations are currently being done?   We’re
> currently using Clang 3.5 if that matters.
>
>
>
>
>
> For example, from reading different pages on how Clang PGO, it’s unclear
> if it does “block reordering” (i.e. moving unexecuted code blocks to a
> distant code page, leaving only ‘hot’ executed code packed together for
> greater code density).
>
>
>
> LLVM's block placement uses branch probability and frequency data, but
> there is no function splitting optimization yet.
>
>
>
>   I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m
> unclear if this is the same thing.  Does Clang PGO do block reordering?
>
>
>
> It does reordering, but does not do splitting/partitioning.
>
>
>
> David
>
>
>
>
>
>
>
> Thanks,
>
> --Lee
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150527/ddeb58c6/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - May 2015 - [LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)

[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)

[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)

[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)

[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)

Maybe Matching Threads