Duncan Sands
2012-Nov-09 13:23 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Dmitry,> You're right, global variables use preferred alignment. And - yes, > preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's > prop. compiler gives 4. However, since CUDA 5.0 ptx modules are > linkable with each other, I think alignments for externally visible > functions and data should all follow ABI rules.giving it an alignment of 8 does follow ABI rules: everything that is 8 byte aligned has an address that is a multiple of 4, i.e. it is also 4 byte aligned. It would be wrong to assume that external globals have the preferred alignment: they can only be assumed to have the ABI alignment. But as we are aligning this variable we can give it the alignment we like as long as it is at least 4. In short, I think using the preferred alignment is correct in this context.> Is there a guide on making tests? I have ~5 pending patches never > accepted simply because I'm not familiar with LLVM test system :-/Your test would go in test/CodeGen/PTX/. It should use FileCheck, take a look at test/CodeGen/X86/zext-trunc.ll for an example. As there are no PTX tests at all for the moment, you need to create a lit.local.cfg file in PTX/. Imitate the X86 one. Ciao, Duncan.> > - D. > > 2012/11/9 Duncan Sands <baldrick at free.fr>: >> Hi Dmitry, >> >> >>> I'm attaching a patch that should fix the issue mentioned above. It >>> >>> simply makes the same check seen in the same file for global >>> variables: >>> >>> emitPTXAddressSpace(PTy->getAddressSpace(), O); >>> if (GVar->getAlignment() == 0) >>> O << " .align " << (int) TD->getPrefTypeAlignment(ETy); >>> else >>> O << " .align " << GVar->getAlignment(); >> >> >> it's not quite the same because your patch uses the ABI alignment, while >> in this snippet it is the preferred alignment (which is usually the same >> as the ABI alignment, but may be bigger). >> >> >>> Could you please review and commit? Do you think it needs a test case? >> >> >> Yes, it needs a testcase. >> >> Ciao, Duncan. >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Justin Holewinski
2012-Nov-09 21:49 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Test cases exist under test/CodeGen/NVPTX (name changed in May). Now that I'm back at NVIDIA, I'm going to be running through the bugzilla issues (thanks Dmitry for the reports!). I have practically the exact same patch here in my queue. :) In this case, I would prefer ABI alignment for compatibility with the vendor compiler. It should work either way, but I do need to audit the codebase and tie up any issues here. On Fri, Nov 9, 2012 at 5:23 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Dmitry, > > > You're right, global variables use preferred alignment. And - yes, >> preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's >> prop. compiler gives 4. However, since CUDA 5.0 ptx modules are >> linkable with each other, I think alignments for externally visible >> functions and data should all follow ABI rules. >> > > giving it an alignment of 8 does follow ABI rules: everything that is > 8 byte aligned has an address that is a multiple of 4, i.e. it is also > 4 byte aligned. > > It would be wrong to assume that external globals have the preferred > alignment: they can only be assumed to have the ABI alignment. But > as we are aligning this variable we can give it the alignment we like > as long as it is at least 4. > > In short, I think using the preferred alignment is correct in this > context. > > > Is there a guide on making tests? I have ~5 pending patches never >> accepted simply because I'm not familiar with LLVM test system :-/ >> > > Your test would go in test/CodeGen/PTX/. It should use FileCheck, > take a look at test/CodeGen/X86/zext-trunc.ll for an example. As > there are no PTX tests at all for the moment, you need to create a > lit.local.cfg file in PTX/. Imitate the X86 one. > > Ciao, Duncan. > > > >> - D. >> >> 2012/11/9 Duncan Sands <baldrick at free.fr>: >> >>> Hi Dmitry, >>> >>> >>> I'm attaching a patch that should fix the issue mentioned above. It >>>> >>>> simply makes the same check seen in the same file for global >>>> variables: >>>> >>>> emitPTXAddressSpace(PTy->**getAddressSpace(), O); >>>> if (GVar->getAlignment() == 0) >>>> O << " .align " << (int) TD->getPrefTypeAlignment(ETy); >>>> else >>>> O << " .align " << GVar->getAlignment(); >>>> >>> >>> >>> it's not quite the same because your patch uses the ABI alignment, while >>> in this snippet it is the preferred alignment (which is usually the same >>> as the ABI alignment, but may be bigger). >>> >>> >>> Could you please review and commit? Do you think it needs a test case? >>>> >>> >>> >>> Yes, it needs a testcase. >>> >>> Ciao, Duncan. >>> >>> ______________________________**_________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >>> >> > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121109/a388799e/attachment.html>
Dmitry N. Mikushin
2012-Nov-09 21:58 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Justin, Very glad you're finally back! How's you PhD? Please see the attached updated patch. Since you think ABI is better, I'm only adding a test case. - D. 2012/11/9 Justin Holewinski <justin.holewinski at gmail.com>:> Test cases exist under test/CodeGen/NVPTX (name changed in May). Now that > I'm back at NVIDIA, I'm going to be running through the bugzilla issues > (thanks Dmitry for the reports!). I have practically the exact same patch > here in my queue. :) > > In this case, I would prefer ABI alignment for compatibility with the vendor > compiler. It should work either way, but I do need to audit the codebase > and tie up any issues here. > > > On Fri, Nov 9, 2012 at 5:23 AM, Duncan Sands <baldrick at free.fr> wrote: >> >> Hi Dmitry, >> >> >>> You're right, global variables use preferred alignment. And - yes, >>> preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's >>> prop. compiler gives 4. However, since CUDA 5.0 ptx modules are >>> linkable with each other, I think alignments for externally visible >>> functions and data should all follow ABI rules. >> >> >> giving it an alignment of 8 does follow ABI rules: everything that is >> 8 byte aligned has an address that is a multiple of 4, i.e. it is also >> 4 byte aligned. >> >> It would be wrong to assume that external globals have the preferred >> alignment: they can only be assumed to have the ABI alignment. But >> as we are aligning this variable we can give it the alignment we like >> as long as it is at least 4. >> >> In short, I think using the preferred alignment is correct in this >> context. >> >> >>> Is there a guide on making tests? I have ~5 pending patches never >>> accepted simply because I'm not familiar with LLVM test system :-/ >> >> >> Your test would go in test/CodeGen/PTX/. It should use FileCheck, >> take a look at test/CodeGen/X86/zext-trunc.ll for an example. As >> there are no PTX tests at all for the moment, you need to create a >> lit.local.cfg file in PTX/. Imitate the X86 one. >> >> Ciao, Duncan. >> >> >>> >>> - D. >>> >>> 2012/11/9 Duncan Sands <baldrick at free.fr>: >>>> >>>> Hi Dmitry, >>>> >>>> >>>>> I'm attaching a patch that should fix the issue mentioned above. It >>>>> >>>>> simply makes the same check seen in the same file for global >>>>> variables: >>>>> >>>>> emitPTXAddressSpace(PTy->getAddressSpace(), O); >>>>> if (GVar->getAlignment() == 0) >>>>> O << " .align " << (int) TD->getPrefTypeAlignment(ETy); >>>>> else >>>>> O << " .align " << GVar->getAlignment(); >>>> >>>> >>>> >>>> it's not quite the same because your patch uses the ABI alignment, while >>>> in this snippet it is the preferred alignment (which is usually the same >>>> as the ABI alignment, but may be bigger). >>>> >>>> >>>>> Could you please review and commit? Do you think it needs a test case? >>>> >>>> >>>> >>>> Yes, it needs a testcase. >>>> >>>> Ciao, Duncan. >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > > Thanks, > > Justin Holewinski >-------------- next part -------------- A non-text attachment was scrubbed... Name: align0.patch Type: application/octet-stream Size: 898 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121109/e9b58ed3/attachment.obj>
Duncan Sands
2012-Nov-10 09:37 UTC
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Justin, On 09/11/12 22:49, Justin Holewinski wrote:> Test cases exist under test/CodeGen/NVPTX (name changed in May).I've deleted the empty PTX directory. Now that I'm> back at NVIDIA, I'm going to be running through the bugzilla issues (thanks > Dmitry for the reports!). I have practically the exact same patch here in my > queue. :) > > In this case, I would prefer ABI alignment for compatibility with the vendor > compiler.I don't really understand this argument. If the vendor compiler is aligning to 4 (say) then some globals will have address a multiple of 4, some will have address a multiple of 8, some will have address a multiple of 16 etc, depending on the accidents of just where in memory they happen to be placed. For example, if you have two 4 byte globals that follow each other in memory, and that are 4 byte aligned, then if the first one has address a multiple of 4 then the second will have address a multiple of 8. In short lots of variables will be 8 byte aligned by accident. If LLVM gives them all an alignment of 8, what does that change? OK, I will now admit that there is an effect if assumptions are being made about globals being placed next to each other: if you declare two globals A and B immediately after each other in the IR then the LLVM semantics doesn't guarantee that they will be laid out one immediately after the other in memory. But that's how it happens in practice so maybe people are (wrongly) relying on that. Bumping up the alignment to a multiple of 8 may add extra padding between A and B, causing B to not be at the position that such naughty people are expecting. It should work either way, but I do need to audit the codebase and> tie up any issues here.The IR optimizers already bump the alignment of some globals up to the preferred alignment, check out enforceKnownAlignment in Local.cpp (it ends up being called from instcombine). Ciao, Duncan.
Maybe Matching Threads
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params