thr3ads.net - llvm dev - [LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Rahman Lavaee

2014-Mar-19 17:34 UTC

[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

Hi,

I have written a code layout feedback directed optimization pass, which
currently works for basic block reordering and function reordering. It very
effectively improves the speedup (we could improve Python by 30%). The
profiling method is window based context sensitive which is based on
reference affinity (
https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=28368&itemFileId=143426
)

The pass works in the IR level. Therefore, it may lose some information
during the machine code optimization passes and perform imprecisely for BB
reordering.

Eventually, I would like to see the improve for an interprocedural basic
block reordering pass. However, with the current system there are several
challenges ahead. The most important is that the CFG is not preserved
during several passes including code-gen-prepare, cfg-simplify,
remove-unreachable-blocks, tail-merge, and tail-duplication. So in order to
keep track of the mapping between MBBs and BBs, one needs to insert code in
every function that modifies the structure of BBs and MBBs.

The current block placement pass (MachineBasicBlockPlacement) works at the
machine code level and with the new profiling structure
(SampleProfileLoader), is effective as far as context-free profiling info
is considered sufficient. However, the implementation of
SampleProfileLoader itself encourages context sensitive info, which cannot
efficiently be provided with the current profiling structure
(<func,lineNo>).

Is there any way to incorporate information into the emitted MBBs so that
we can get IR basic block level info instead of lineNo info?

regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140319/1ba91a8e/attachment.html>

John Criswell

2014-Mar-19 19:25 UTC

head link

[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

Dear Rahman,

First, if you can, try to use the mapping between BasicBlocks and 
MachineBasicBlocks after all LLVM IR optimizations have been done (if 
you are not doing that already).

Second, there are several ideas you might want to try:

1. The llvm.pcmarker() intrinsic seems close to what you need. That 
said, it looks like the optimizers are free to move them around any way 
they like, but perhaps most optimizations will leave them within the 
basic block in which they were originally inserted.

2. A volatile load or an llvm.prefetch instruction might be a workable 
hack.  Alternatively, you could insert an inline assembly call which the 
optimizer is unlikely to move.  The key here is to provide a unique 
argument to the each instruction you insert so that you can map it back 
to its original basic block.

3. You could insert an llvm.var.annotation or llvm.annotation intrinsic 
into each basic block and modify the code generator to recognize your 
annotation.

I'm not sure which of these would be the best option.  I would try 
llvm.pcmarker first to see if that works and then move on to the other 
options as needed.

Regards,

John Criswell

On 3/19/14, 12:34 PM, Rahman Lavaee wrote:> Hi,
>
> I have written a code layout feedback directed optimization pass, 
> which currently works for basic block reordering and function 
> reordering. It very effectively improves the speedup (we could improve 
> Python by 30%). The profiling method is window based context sensitive 
> which is based on reference affinity 
>
(https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=28368&itemFileId=143426)
>
> The pass works in the IR level. Therefore, it may lose some 
> information during the machine code optimization passes and perform 
> imprecisely for BB reordering.
>
> Eventually, I would like to see the improve for an interprocedural 
> basic block reordering pass. However, with the current system there 
> are several challenges ahead. The most important is that the CFG is 
> not preserved during several passes including code-gen-prepare, 
> cfg-simplify, remove-unreachable-blocks, tail-merge, and 
> tail-duplication. So in order to keep track of the mapping between 
> MBBs and BBs, one needs to insert code in every function that modifies 
> the structure of BBs and MBBs.
>
> The current block placement pass (MachineBasicBlockPlacement) works at 
> the machine code level and with the new profiling structure 
> (SampleProfileLoader), is effective as far as context-free profiling 
> info is considered sufficient. However, the implementation of 
> SampleProfileLoader itself encourages context sensitive info, which 
> cannot efficiently be provided with the current profiling structure 
> (<func,lineNo>).
>
> Is there any way to incorporate information into the emitted MBBs so 
> that we can get IR basic block level info instead of lineNo info?
>
> regards
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140319/f9c4c670/attachment.html>

Rahman Lavaee

2014-Mar-19 19:50 UTC

head link

[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

Thanks John.

Regarding

First, if you can, try to use the mapping between BasicBlocks
and> MachineBasicBlocks after all LLVM IR optimizations have been done (if you
> are not doing that already).
>
Good point. I am not doing this at the moment, but I can and will certainly
do. It seems to me that even some of the IR optimizations are target
dependent. For instance, I believe the compiler performs tail call
optimizations only if it the target supports them. Therefore, I  probably
need to do the instrumentation some time during code generation.

Regarding

> Second, there are several ideas you might want to try:
>
> 1. The llvm.pcmarker() intrinsic seems close to what you need.  That said,
> it looks like the optimizers are free to move them around any way they
> like, but perhaps most optimizations will leave them within the basic block
> in which they were originally inserted.
>
> 2. A volatile load or an llvm.prefetch instruction might be a workable
> hack.  Alternatively, you could insert an inline assembly call which the
> optimizer is unlikely to move.  The key here is to provide a unique
> argument to the each instruction you insert so that you can map it back to
> its original basic block.
>
> 3. You could insert an llvm.var.annotation or llvm.annotation intrinsic
> into each basic block and modify the code generator to recognize your
> annotation.
>
> I'm not sure which of these would be the best option.  I would try
> llvm.pcmarker first to see if that works and then move on to the other
> options as needed.
>
Thank you so much for suggestions. I have not used intrinsics before, but
it looks like they can be handy. I will read some LLVM manual to learn more
about them and see what I can do.

>
> On 3/19/14, 12:34 PM, Rahman Lavaee wrote:
>
>  Hi,
>
> I have written a code layout feedback directed optimization pass, which
> currently works for basic block reordering and function reordering. It very
> effectively improves the speedup (we could improve Python by 30%). The
> profiling method is window based context sensitive which is based on
> reference affinity (
>
https://urresearch.rochester.edu/fileDownloadForInstitutionalItem.action?itemId=28368&itemFileId=143426
> )
>
>  The pass works in the IR level. Therefore, it may lose some information
> during the machine code optimization passes and perform imprecisely for BB
> reordering.
>
>  Eventually, I would like to see the improve for an interprocedural basic
> block reordering pass. However, with the current system there are several
> challenges ahead. The most important is that the CFG is not preserved
> during several passes including code-gen-prepare, cfg-simplify,
> remove-unreachable-blocks, tail-merge, and tail-duplication. So in order to
> keep track of the mapping between MBBs and BBs, one needs to insert code in
> every function that modifies the structure of BBs and MBBs.
>
>  The current block placement pass (MachineBasicBlockPlacement) works at
> the machine code level and with the new profiling structure
> (SampleProfileLoader), is effective as far as context-free profiling info
> is considered sufficient. However, the implementation of
> SampleProfileLoader itself encourages context sensitive info, which cannot
> efficiently be provided with the current profiling structure
> (<func,lineNo>).
>
>  Is there any way to incorporate information into the emitted MBBs so
> that we can get IR basic block level info instead of lineNo info?
>
>  regards
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140319/f7cbe609/attachment.html>

llvm dev - Mar 2014 - [LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities

[LLVMdev] Interprocedural Block Placement algorithm, challenges and opportunities