Joan Lluch via llvm-dev
2019-May-13 09:17 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
I had already adjusted MaxStoresPerMemcpy to my preferred value, and this works great but for the cases where load/stores are used on non-size aligned structs the odd behaviour still happens. For my 3-char, 3-byte struct test, the memcpy replacement appears to consist on a single byte load and store of the last char (this is correct), followed by a 16 bit move of the first two chars, this is also correct, but the odd thing is that the 16 bit move is performed by picking bytes separately from the source struct, then combining them into 16 bit values by means of shifts, swap, and or, then moved as words to the destination. I don’t know why simple 16 bit load and stores are used instead. Will try to track findOptimalMemOpLowering with the debugger to see if I see some light. So thanks for pointing that out to me. John Tel: 620 28 45 13> On 13 May 2019, at 09:33, Tim Northover <t.p.northover at gmail.com> wrote: > > Hi Joan, > > On Mon, 13 May 2019 at 07:53, Joan Lluch <joan.lluch at icloud.com> wrote: >> The reason I want structs to be aligned/padded to 2 bytes is because my architecture only has 16 bit operations. I can read (sign and zero extended) and write (truncated) 8 bit data from/to memory, but all intermediate operations in registers are performed in 16 bit registers. > > This is very normal. Mostly it's at 32-bits rather than 16, but it > applies to basically every RISC architecture so LLVM should handle it > well without adjusting the alignment requirements of types. > >> This causes LLVM to generate odd tricks such as shifts and byte-swaps, when trying to replace struct ‘memcpy’s by word sized load/store instructions. > > That sounds odd, as if you've not taught the backend to use those > 8-bit loads and stores so it's trying to emulate them with word-sized > ones (early Alpha chips genuinely didn't have byte access so had to do > that kind of thing). You can (and should) probably fix that. > > Also, there are a few customization points where you can control how > memcpy is implemented. The function "findOptimalMemOpLowering" lets > you control the type used for the loads and stores, and > MaxStoresPerMemcpy controls when LLVM will call the real memcpy. If > you want even more control you can implement EmitTargetCodeForMemcpy > to do the whole thing. > > Cheers. > > Tim.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190513/17263825/attachment.html>
Joan Lluch via llvm-dev
2019-May-13 17:01 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
Hi Tim, After looking at it a bit further, I think this is a Clang thing. Clang issues “align 2” if the struct has at least one int (2 bytes), but also if the entire struct size is multiple of 2. For example a struct with 4 char members. In these cases the LLVM backend correctly creates word sized load/stores (2 bytes). However, Clang issues “align 1” for the remaining cases, for example for 3 byte structs with 3 char members. The LLVM backend just follows what’s dictated by Clang regarding alignment and thus it creates 2 byte or 1 byte load/stores instructions accordingly. I have not found a way to override this in LLVM. Any suggestions are appreciated. I have also posted this problem in the cfe-def list. John Lluch Tel: 620 28 45 13> On 13 May 2019, at 11:17, Joan Lluch <joan.lluch at icloud.com> wrote: > > I had already adjusted MaxStoresPerMemcpy to my preferred value, and this works great but for the cases where load/stores are used on non-size aligned structs the odd behaviour still happens. For my 3-char, 3-byte struct test, the memcpy replacement appears to consist on a single byte load and store of the last char (this is correct), followed by a 16 bit move of the first two chars, this is also correct, but the odd thing is that the 16 bit move is performed by picking bytes separately from the source struct, then combining them into 16 bit values by means of shifts, swap, and or, then moved as words to the destination. I don’t know why simple 16 bit load and stores are used instead. Will try to track findOptimalMemOpLowering with the debugger to see if I see some light. So thanks for pointing that out to me. > > John > > Tel: 620 28 45 13 > >> On 13 May 2019, at 09:33, Tim Northover <t.p.northover at gmail.com <mailto:t.p.northover at gmail.com>> wrote: >> >> Hi Joan, >> >> On Mon, 13 May 2019 at 07:53, Joan Lluch <joan.lluch at icloud.com <mailto:joan.lluch at icloud.com>> wrote: >>> The reason I want structs to be aligned/padded to 2 bytes is because my architecture only has 16 bit operations. I can read (sign and zero extended) and write (truncated) 8 bit data from/to memory, but all intermediate operations in registers are performed in 16 bit registers. >> >> This is very normal. Mostly it's at 32-bits rather than 16, but it >> applies to basically every RISC architecture so LLVM should handle it >> well without adjusting the alignment requirements of types. >> >>> This causes LLVM to generate odd tricks such as shifts and byte-swaps, when trying to replace struct ‘memcpy’s by word sized load/store instructions. >> >> That sounds odd, as if you've not taught the backend to use those >> 8-bit loads and stores so it's trying to emulate them with word-sized >> ones (early Alpha chips genuinely didn't have byte access so had to do >> that kind of thing). You can (and should) probably fix that. >> >> Also, there are a few customization points where you can control how >> memcpy is implemented. The function "findOptimalMemOpLowering" lets >> you control the type used for the loads and stores, and >> MaxStoresPerMemcpy controls when LLVM will call the real memcpy. If >> you want even more control you can implement EmitTargetCodeForMemcpy >> to do the whole thing. >> >> Cheers. >> >> Tim. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190513/1cf10ed7/attachment.html>
Tim Northover via llvm-dev
2019-May-13 18:09 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
Hi Joan, On Mon, 13 May 2019 at 18:01, Joan Lluch <joan.lluch at icloud.com> wrote:> After looking at it a bit further, I think this is a Clang thing. Clang issues “align 2” if the struct has at least one int (2 bytes), but also if the entire struct size is multiple of 2. For example a struct with 4 char members. In these cases the LLVM backend correctly creates word sized load/stores (2 bytes).I'm slightly surprised that it happens based purely on size, but either way LLVM should be able to cope.> The LLVM backend just follows what’s dictated by Clang regarding alignment and thus it creates 2 byte or 1 byte load/stores instructions accordingly. I have not found a way to override this in LLVM. Any suggestions are appreciated.That sounds right, but I don't think it explains the shifts you described before. It should work out a lot better than what you're seeing. Specifically, a 3 byte struct (for example) ought to either lower to: load i16, load i8 + stores if your target can do misaligned i16 operations. or load i8, load i8, load i8 + stores if not. Neither of those involve shifting operations. I'd suggest breaking just after getMemcpyLoadsAndStores and using SelectionDAG::dump to see exactly what it's created. Then try to work out where that gets pessimized to shifts, because it's not normal. Cheers. Tim.
Tim Northover via llvm-dev
2019-May-13 18:11 UTC
[llvm-dev] How to change CLang struct alignment behaviour?
On Mon, 13 May 2019 at 18:01, Joan Lluch <joan.lluch at icloud.com> wrote:> I have also posted this problem in the cfe-def list.Also: I'm still pretty certain that "solving" this in Clang is the wrong approach. Cheers. Tim.
Reasonably Related Threads
- How to change CLang struct alignment behaviour?
- How to change CLang struct alignment behaviour?
- How to change CLang struct alignment behaviour?
- [cfe-dev] CFG simplification question, and preservation of branching in the original code
- [cfe-dev] CFG simplification question, and preservation of branching in the original code