thr3ads.net - llvm dev - [LLVMdev] Preferred alignment of globals

If this information is useful, please help other people find it:
Share via:

Richard Osborne

2012-Sep-18 10:11 UTC

[LLVMdev] Preferred alignment of globals > 16bytes

On 07/09/12 18:13, Chris Lattner wrote:> On Sep 7, 2012, at 8:02 AM, Richard Osborne <richard at xmos.com>
wrote:
>>>> I was a bit surprised to see these numbers hardcoded in
TargetData since everything else is taken from the datalayout string. I was
wondering what the logic was behind the number 16. Would it make sense to derive
this number from the other alignments somehow (e.g. the maximum preferred
alignment across all types). Alternatively would it make sense to make it
configurable in the datalayout string?
>>> I don't think that there is any specific logic to it.  It was
just a heuristic that "made sense at the time", not based on any
scientific evaluation.  If you want to raise it to some other arbitrary limit
(say 32 bytes) that would be fine given some performance analysis.  If you want
to design a way to express this in target data, please propose a specific way to
model it.
>>>
>>> -Chris
>> I want it to be configurable. On my target there is no advantage to
aligning anything by more than a 4 bytes - it just wastes space. I'll try to
put together a proposal for making it possible to set this per target.
> Ok, after considering this a bit more, how about we take a different
approach: despite its name, TargetData is really mostly for consumption by the
mid-level optimizers.  getPreferredAlignment is really something that only
TargetLowering[ObjectFile] should be using.  What do you think about sinking
getPreferredAlignment down to it? This will make it much easier to
target-hookize this.
>
> -ChrisHi Chris,

Sorry for not replying sooner but I've haven't had a chance to look at 
this until now. I think your suggestion makes a lot of sense. The 
preferred alignment will be a lot easier to change if it is not embedded 
in the IR in the form of the target data string. I started trying to 
moving the functions to get preferred alignment to TargetLowering but 
found there are a few mid-level optimizers that make use of the 
preferred alignment. In particular:

* lib/Analysis/ValueTracking.cpp - uses getPreferredAlignment() to 
determine number of trailing zeros of the address of a global defined in 
the current module.
* lib/Transforms/Utils/InlineFunction.cpp - Uses getPrefTypeAlignment() 
to set the alignment of the alloca it introduces when a byval argument 
is inlined
* lib/Transforms/IPO/ConstantMerge.cpp - Uses getPreferredAlignment() to 
determine the alignment of the merged constant.
* lib/Transforms/Vectorize/BBVectorize.cpp - Depending on what options 
are set this pass avoids vectorization if it would introduce loads and 
store of less than the alignment returned by getPreferredAlignment().
* lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp - 
getPrefTypeAlignment() is used to explictly set the alignment of allocas 
with unspecified alignment. Also instcombine tries to enforce preferred 
alignment for loads and stores if possible.

Does it seem reasonable to change these to use ABI alignment?

-- 
Richard Osborne | XMOS
http://www.xmos.com

Chris Lattner

2012-Sep-20 17:18 UTC

head link

[LLVMdev] Preferred alignment of globals > 16bytes

On Sep 18, 2012, at 3:11 AM, Richard Osborne <richard at xmos.com>
wrote:>>> I want it to be configurable. On my target there is no advantage to
aligning anything by more than a 4 bytes - it just wastes space. I'll try to
put together a proposal for making it possible to set this per target.
>> Ok, after considering this a bit more, how about we take a different
approach: despite its name, TargetData is really mostly for consumption by the
mid-level optimizers.  getPreferredAlignment is really something that only
TargetLowering[ObjectFile] should be using.  What do you think about sinking
getPreferredAlignment down to it? This will make it much easier to
target-hookize this.
>> 
>> -Chris
> Hi Chris,
> 
> Sorry for not replying sooner but I've haven't had a chance to look
at this until now. I think your suggestion makes a lot of sense. The preferred
alignment will be a lot easier to change if it is not embedded in the IR in the
form of the target data string. I started trying to moving the functions to get
preferred alignment to TargetLowering but found there are a few mid-level
optimizers that make use of the preferred alignment. In particular:
> 
> * lib/Analysis/ValueTracking.cpp - uses getPreferredAlignment() to
determine number of trailing zeros of the address of a global defined in the
current module.
This "works", but isn't safe.  It is relying on an assumption that
the backend will provide at least that alignment for a global, and assuming that
nothing later than it will change the alignment to 1.  This needs to be fixed.
> * lib/Transforms/Utils/InlineFunction.cpp - Uses getPrefTypeAlignment() to
set the alignment of the alloca it introduces when a byval argument is inlined
> * lib/Transforms/IPO/ConstantMerge.cpp - Uses getPreferredAlignment() to
determine the alignment of the merged constant.
Can't these just leave the alignment unset and let the code generator do its
thing?
> * lib/Transforms/Vectorize/BBVectorize.cpp - Depending on what options are
set this pass avoids vectorization if it would introduce loads and store of less
than the alignment returned by getPreferredAlignment().
I can't comment on this specifically, but this also sounds wrong.  Hal?
> * lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp -
getPrefTypeAlignment() is used to explictly set the alignment of allocas with
unspecified alignment. Also instcombine tries to enforce preferred alignment for
loads and stores if possible.

This case is more interesting to me - the idea here is that instcombine is
trying to expose alignment information beyond the ABI alignment for optimization
passes, allowing other passes to benefit from it - and allowing alignment
inference of loads and stores to the alloca, which may help codegen.

I tend to think that this is the wrong approach - optimizations that would
benefit from higher alignment should themselves force the alloca to be more
aligned, not rely on instcombine to do it.  The code generator should wait until
the alloca gets a specific alignment (which presumably will be the preferred
alignment) and then bump up the alignment of loads and stores based on that
alignment, instead of just taking what IR has and blindly going with it.

-Chris

Hal Finkel

2012-Sep-20 17:47 UTC

head link

[LLVMdev] Preferred alignment of globals > 16bytes

On Thu, 20 Sep 2012 10:18:46 -0700
Chris Lattner <clattner at apple.com> wrote:
> On Sep 18, 2012, at 3:11 AM, Richard Osborne <richard at xmos.com>
wrote:
> >>> I want it to be configurable. On my target there is no
advantage
> >>> to aligning anything by more than a 4 bytes - it just wastes
> >>> space. I'll try to put together a proposal for making it
possible
> >>> to set this per target.
> >> Ok, after considering this a bit more, how about we take a
> >> different approach: despite its name, TargetData is really mostly
> >> for consumption by the mid-level optimizers.
> >> getPreferredAlignment is really something that only
> >> TargetLowering[ObjectFile] should be using.  What do you think
> >> about sinking getPreferredAlignment down to it? This will make it
> >> much easier to target-hookize this.
> >> 
> >> -Chris
> > Hi Chris,
> > 
> > Sorry for not replying sooner but I've haven't had a chance to
look
> > at this until now. I think your suggestion makes a lot of sense.
> > The preferred alignment will be a lot easier to change if it is not
> > embedded in the IR in the form of the target data string. I started
> > trying to moving the functions to get preferred alignment to
> > TargetLowering but found there are a few mid-level optimizers that
> > make use of the preferred alignment. In particular:
> > 
> > * lib/Analysis/ValueTracking.cpp - uses getPreferredAlignment() to
> > determine number of trailing zeros of the address of a global
> > defined in the current module.
> 
> This "works", but isn't safe.  It is relying on an assumption
that
> the backend will provide at least that alignment for a global, and
> assuming that nothing later than it will change the alignment to 1.
> This needs to be fixed.
> 
> > * lib/Transforms/Utils/InlineFunction.cpp - Uses
> > getPrefTypeAlignment() to set the alignment of the alloca it
> > introduces when a byval argument is inlined
> > * lib/Transforms/IPO/ConstantMerge.cpp - Uses
> > getPreferredAlignment() to determine the alignment of the merged
> > constant.
> 
> Can't these just leave the alignment unset and let the code generator
> do its thing?
> 
> > * lib/Transforms/Vectorize/BBVectorize.cpp - Depending on what
> > options are set this pass avoids vectorization if it would
> > introduce loads and store of less than the alignment returned by
> > getPreferredAlignment().
> 
> I can't comment on this specifically, but this also sounds wrong.
> Hal?
This is correct, but I don't think that it has anything to do with
globals.

Some SIMD ISAs allow efficient loads and stores of vectors with
the scalar-element alignment and some don't. For those that don't, we
don't want the vectorizer to form these unaligned loads because this
might cause prevent vector-aligned loads from being generated. In
theory we could always form the unaligned loads and let the legalizer
break them up as necessary, but this often leads to suboptimal code
generation for various reasons.
> 
> > * lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp -
> > getPrefTypeAlignment() is used to explictly set the alignment of
> > allocas with unspecified alignment. Also instcombine tries to
> > enforce preferred alignment for loads and stores if possible.
> 
> 
> This case is more interesting to me - the idea here is that
> instcombine is trying to expose alignment information beyond the ABI
> alignment for optimization passes, allowing other passes to benefit
> from it - and allowing alignment inference of loads and stores to the
> alloca, which may help codegen.
> 
> I tend to think that this is the wrong approach - optimizations that
> would benefit from higher alignment should themselves force the
> alloca to be more aligned, not rely on instcombine to do it.  The
> code generator should wait until the alloca gets a specific alignment
> (which presumably will be the preferred alignment) and then bump up
> the alignment of loads and stores based on that alignment, instead of
> just taking what IR has and blindly going with it.
Interesting, I did not know about this; it seems like the vectorizer
should hook into this as well.

 -Hal
> 
> -Chris
> 


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Sep 2012 - [LLVMdev] Preferred alignment of globals > 16bytes

[LLVMdev] Preferred alignment of globals > 16bytes

[LLVMdev] Preferred alignment of globals > 16bytes

[LLVMdev] Preferred alignment of globals > 16bytes

Maybe Matching Threads