Tim Northover via llvm-dev
2019-May-14 17:55 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
Hi John, On Tue, 14 May 2019 at 17:51, Joan Lluch <joan.lluch at icloud.com> wrote:> This problem is also shared by the MSP430 target, and it’s very easy to reproduce by just compiling the code that I posted before.That's some good detective work; it definitely explains what you're seeing. Since MSP430 is affected it would probably be pretty easy to upstream an alignment-aware version of the function and test it, if you're keen. I'd promise to do a review promptly! Cheers. Tim.
Joan Lluch via llvm-dev
2019-May-14 21:21 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
Hi Tim, I am not sure about what you mean by “upstream” the function. I attach a text file with my proposed changes in case you can do something with it. It seems to me that the programmer that coded this function was already considering this, or maybe somebody spoiled it later, because it turns out that the problem is just a missing sentence in the function, that I have now added. I have tested it the best that I could for several targets, but I have not applied any systematised testing procedure. I also found that not only the MSP430 target is affected, but also the MIPS16 target too. For the MIPS16 you also need to remove the “getOptimalMemOpType” or make it return MVT::Other, because the current implementation returning a concrete MVT prevents the proposed changes to take the desired effect. The changes benefit the MIPS16 (llc -march=mips -mips-os16) because it’s a subtarget that does not support unaligned accesses. Thanks for your help. The function is attached, it just has an added statement John Lluch> On 14 May 2019, at 19:55, Tim Northover <t.p.northover at gmail.com> wrote: > > Hi John, > > On Tue, 14 May 2019 at 17:51, Joan Lluch <joan.lluch at icloud.com> wrote: >> This problem is also shared by the MSP430 target, and it’s very easy to reproduce by just compiling the code that I posted before. > > That's some good detective work; it definitely explains what you're > seeing. Since MSP430 is affected it would probably be pretty easy to > upstream an alignment-aware version of the function and test it, if > you're keen. I'd promise to do a review promptly! > > Cheers. > > Tim.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190514/8454ab16/attachment.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: FindOptimalMemOpLowering.txt URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190514/8454ab16/attachment.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190514/8454ab16/attachment-0001.html>
Joan Lluch via llvm-dev
2019-May-17 20:12 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
Hello Tim,
After a second look at the topic I have gotten a better understanding on what’s
going on.
On the original implementation FindOptimalMemOpLowering and getOptimalMemOpType
gets called to find the minimal set of load/store pairs to replace a memcpy
operation. So for example, a 5 byte memcpy on a 32 bit architecture gets
replaced by a 32 bit load/store and a 8 bit extload/truncstore. In a similar
way, for a 16 bit architecture this will get replaced by a two 16 bit
load/stores and one 8 bit extload/truncstore That’s the optimal way to deal
with it, however, for targets not implementing the
<TheTarget>TargetLowering::allowsMisalignedMemoryAccesses function, this
causes a call to expandUnalignedLoad at a later stage, which inserts the
undesired shift/or code sequences.
On my previous fix approach, I attempted to solve the issue by getting the
FindOptimalMemOpLowering and getOptimalMemOpType to use the smallest possible
alignment for both the source and the destination. This avoids the shift/or code
because the generated load/stores are always of the smaller type. This kind of
works, but that approach generates a larger number of load/store pairs than the
original implementation. For example, for a 5 byte memcpy it will generate 5
single byte load/store pairs, which may be more or less optimal depending on the
target or whether you aim to optimize for speed or size.
The ultimate reason for the original problem is that the ‘load’ instructions are
conservatively created with the source operand alignment. But the loads in
reality never attempt to access unaligned offsets. Indeed, all the load/store
replacements (regardless of size) are created on aligned offsets with respect to
the origin of the memcpy operands, regardless of the target supporting or not
misaligned accesses. This means that there’s no need to expand the load/stores
into shifts/ors even for targets not supporting misaligned accesses, PROVIDED
THAT we know that the initial operand addresses of the memcpy were aligned
already.
So this leads to a problem of a more difficult solution, we need to infer more
information from the memcpy operands. The current getMemcpyLoadsAndStores
implementation already checks for the destination to be the stack frame, and
uses this to provide bigger alignments to stores. For the source operand the
function calls SelectionDAG::InferPtrAlignment. Unfortunately, the later always
returns 0 rather than the proper alignment for the source node, which is the
seed of the problem. Instead, if we already knew that pointers to a base Global
Address are aligned to 2 for a particular architecture, we should use that
information to set SrcAlign, and therefore produce an optimal sequence of
load/stores for memcpy replacement.
We need to find the nature of the source and set SrcAlign to the correct value
(greater than 1) at the beginning of getMemcpyLoadsAndStores. That would be the
perfect fix.
Alternatively, we could provide the correct Alignment argument already to the
getMemcpy function
So my question now goes back to my original one, how do I do that?
By the way, if I manually set align to 2 in Clang output files, such as on the
line below, (which corresponds to a global var declaration):
@ga = dso_local local_unnamed_addr global %struct.AA { i8 3, i8 4, i8 5, i8 6 },
align 1
replaced by
@ga = dso_local local_unnamed_addr global %struct.AA { i8 3, i8 4, i8 5, i8 6 },
align 2
then everything turns perfect inside LLVM.
So again, I need a hook either in CLang or in LLVM to set the desired align to
global vars and stack allocas.
Any ideas?
John Lluch
> On 14 May 2019, at 23:21, Joan Lluch <joan.lluch at icloud.com>
wrote:
>
> Hi Tim,
>
> I am not sure about what you mean by “upstream” the function. I attach a
text file with my proposed changes in case you can do something with it. It
seems to me that the programmer that coded this function was already considering
this, or maybe somebody spoiled it later, because it turns out that the problem
is just a missing sentence in the function, that I have now added. I have tested
it the best that I could for several targets, but I have not applied any
systematised testing procedure.
>
> I also found that not only the MSP430 target is affected, but also the
MIPS16 target too.
>
> For the MIPS16 you also need to remove the “getOptimalMemOpType” or make it
return MVT::Other, because the current implementation returning a concrete MVT
prevents the proposed changes to take the desired effect. The changes benefit
the MIPS16 (llc -march=mips -mips-os16) because it’s a subtarget that does not
support unaligned accesses.
>
> Thanks for your help. The function is attached, it just has an added
statement
>
> John Lluch
>
> <FindOptimalMemOpLowering.txt>
>
>
>
>> On 14 May 2019, at 19:55, Tim Northover <t.p.northover at gmail.com
<mailto:t.p.northover at gmail.com>> wrote:
>>
>> Hi John,
>>
>> On Tue, 14 May 2019 at 17:51, Joan Lluch <joan.lluch at icloud.com
<mailto:joan.lluch at icloud.com>> wrote:
>>> This problem is also shared by the MSP430 target, and it’s very
easy to reproduce by just compiling the code that I posted before.
>>
>> That's some good detective work; it definitely explains what
you're
>> seeing. Since MSP430 is affected it would probably be pretty easy to
>> upstream an alignment-aware version of the function and test it, if
>> you're keen. I'd promise to do a review promptly!
>>
>> Cheers.
>>
>> Tim.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190517/33b354d4/attachment.html>