Duncan Sands
2012-Nov-10 09:37 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Justin, On 09/11/12 22:49, Justin Holewinski wrote:> Test cases exist under test/CodeGen/NVPTX (name changed in May).I've deleted the empty PTX directory. Now that I'm> back at NVIDIA, I'm going to be running through the bugzilla issues (thanks > Dmitry for the reports!). I have practically the exact same patch here in my > queue. :) > > In this case, I would prefer ABI alignment for compatibility with the vendor > compiler.I don't really understand this argument. If the vendor compiler is aligning to 4 (say) then some globals will have address a multiple of 4, some will have address a multiple of 8, some will have address a multiple of 16 etc, depending on the accidents of just where in memory they happen to be placed. For example, if you have two 4 byte globals that follow each other in memory, and that are 4 byte aligned, then if the first one has address a multiple of 4 then the second will have address a multiple of 8. In short lots of variables will be 8 byte aligned by accident. If LLVM gives them all an alignment of 8, what does that change? OK, I will now admit that there is an effect if assumptions are being made about globals being placed next to each other: if you declare two globals A and B immediately after each other in the IR then the LLVM semantics doesn't guarantee that they will be laid out one immediately after the other in memory. But that's how it happens in practice so maybe people are (wrongly) relying on that. Bumping up the alignment to a multiple of 8 may add extra padding between A and B, causing B to not be at the position that such naughty people are expecting. It should work either way, but I do need to audit the codebase and> tie up any issues here.The IR optimizers already bump the alignment of some globals up to the preferred alignment, check out enforceKnownAlignment in Local.cpp (it ends up being called from instcombine). Ciao, Duncan.
Justin Holewinski
2012-Nov-10 14:37 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Perhaps "compatibility" is the wrong term to use here. For now, I would like to "match" what the vendor compiler does. I don't think using preferred alignment would hurt anything in terms of correctness, but I need to go through the entire back-end to see what effects it could have on performance (e.g. adding extra padding increases local memory usage). It could be a complete non-issue for all I know right now. On Sat, Nov 10, 2012 at 1:37 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Justin, > > > On 09/11/12 22:49, Justin Holewinski wrote: > >> Test cases exist under test/CodeGen/NVPTX (name changed in May). >> > > I've deleted the empty PTX directory. > > > Now that I'm > >> back at NVIDIA, I'm going to be running through the bugzilla issues >> (thanks >> Dmitry for the reports!). I have practically the exact same patch here >> in my >> queue. :) >> >> In this case, I would prefer ABI alignment for compatibility with the >> vendor >> compiler. >> > > I don't really understand this argument. If the vendor compiler is > aligning to > 4 (say) then some globals will have address a multiple of 4, some will have > address a multiple of 8, some will have address a multiple of 16 etc, > depending > on the accidents of just where in memory they happen to be placed. For > example, if you have two 4 byte globals that follow each other in memory, > and > that are 4 byte aligned, then if the first one has address a multiple of 4 > then > the second will have address a multiple of 8. In short lots of variables > will > be 8 byte aligned by accident. If LLVM gives them all an alignment of 8, > what > does that change? OK, I will now admit that there is an effect if > assumptions > are being made about globals being placed next to each other: if you > declare > two globals A and B immediately after each other in the IR then the LLVM > semantics doesn't guarantee that they will be laid out one immediately > after > the other in memory. But that's how it happens in practice so maybe people > are (wrongly) relying on that. Bumping up the alignment to a multiple of 8 > may add extra padding between A and B, causing B to not be at the position > that > such naughty people are expecting. > > > It should work either way, but I do need to audit the codebase and > >> tie up any issues here. >> > > The IR optimizers already bump the alignment of some globals up to the > preferred alignment, check out enforceKnownAlignment in Local.cpp (it ends > up being called from instcombine). > > Ciao, Duncan. >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121110/dc2fcbae/attachment.html>
Duncan Sands
2012-Nov-10 15:38 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Justin, On 10/11/12 15:37, Justin Holewinski wrote:> Perhaps "compatibility" is the wrong term to use here. For now, I would like to > "match" what the vendor compiler does. I don't think using preferred alignment > would hurt anything in terms of correctness, but I need to go through the entire > back-end to see what effects it could have on performance (e.g. adding extra > padding increases local memory usage). It could be a complete non-issue for all > I know right now.don't forget that the backend defines what the preferred alignment is. You can set it equal to the ABI alignment. If measurements show that a higher alignment is better than you can bump the preferred alignment up to a higher value. What I'm saying is that globals should always be output using the preferred alignment not the ABI alignment: it is the preferred value itself you should be playing with. Ciao, Duncan.
Reasonably Related Threads
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params