thr3ads.net - llvm dev - [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2012-Nov-09 13:23 UTC

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Dmitry,
> You're right, global variables use preferred alignment. And - yes,
> preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's
> prop. compiler gives 4. However, since CUDA 5.0 ptx modules are
> linkable with each other, I think alignments for externally visible
> functions and data should all follow ABI rules.
giving it an alignment of 8 does follow ABI rules: everything that is
8 byte aligned has an address that is a multiple of 4, i.e. it is also
4 byte aligned.

It would be wrong to assume that external globals have the preferred
alignment: they can only be assumed to have the ABI alignment.  But
as we are aligning this variable we can give it the alignment we like
as long as it is at least 4.

In short, I think using the preferred alignment is correct in this
context.
> Is there a guide on making tests? I have ~5 pending patches never
> accepted simply because I'm not familiar with LLVM test system :-/
Your test would go in test/CodeGen/PTX/.  It should use FileCheck,
take a look at test/CodeGen/X86/zext-trunc.ll for an example.  As
there are no PTX tests at all for the moment, you need to create a
lit.local.cfg file in PTX/.  Imitate the X86 one.

Ciao, Duncan.
>
> - D.
>
> 2012/11/9 Duncan Sands <baldrick at free.fr>:
>> Hi Dmitry,
>>
>>
>>> I'm attaching a patch that should fix the issue mentioned
above. It
>>>
>>> simply makes the same check seen in the same file for global
>>> variables:
>>>
>>>     emitPTXAddressSpace(PTy->getAddressSpace(), O);
>>>     if (GVar->getAlignment() == 0)
>>>       O << " .align " << (int)
TD->getPrefTypeAlignment(ETy);
>>>     else
>>>       O << " .align " <<
GVar->getAlignment();
>>
>>
>> it's not quite the same because your patch uses the ABI alignment,
while
>> in this snippet it is the preferred alignment (which is usually the
same
>> as the ABI alignment, but may be bigger).
>>
>>
>>> Could you please review and commit? Do you think it needs a test
case?
>>
>>
>> Yes, it needs a testcase.
>>
>> Ciao, Duncan.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Justin Holewinski

2012-Nov-09 21:49 UTC

head link

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Test cases exist under test/CodeGen/NVPTX (name changed in May).  Now that
I'm back at NVIDIA, I'm going to be running through the bugzilla issues
(thanks Dmitry for the reports!).  I have practically the exact same patch
here in my queue. :)

In this case, I would prefer ABI alignment for compatibility with the
vendor compiler.  It should work either way, but I do need to audit the
codebase and tie up any issues here.


On Fri, Nov 9, 2012 at 5:23 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Dmitry,
>
>
>  You're right, global variables use preferred alignment. And - yes,
>> preferred alignment in this case is bigger: 8 instead of 4.
NVIDIA's
>> prop. compiler gives 4. However, since CUDA 5.0 ptx modules are
>> linkable with each other, I think alignments for externally visible
>> functions and data should all follow ABI rules.
>>
>
> giving it an alignment of 8 does follow ABI rules: everything that is
> 8 byte aligned has an address that is a multiple of 4, i.e. it is also
> 4 byte aligned.
>
> It would be wrong to assume that external globals have the preferred
> alignment: they can only be assumed to have the ABI alignment.  But
> as we are aligning this variable we can give it the alignment we like
> as long as it is at least 4.
>
> In short, I think using the preferred alignment is correct in this
> context.
>
>
>  Is there a guide on making tests? I have ~5 pending patches never
>> accepted simply because I'm not familiar with LLVM test system :-/
>>
>
> Your test would go in test/CodeGen/PTX/.  It should use FileCheck,
> take a look at test/CodeGen/X86/zext-trunc.ll for an example.  As
> there are no PTX tests at all for the moment, you need to create a
> lit.local.cfg file in PTX/.  Imitate the X86 one.
>
> Ciao, Duncan.
>
>
>
>> - D.
>>
>> 2012/11/9 Duncan Sands <baldrick at free.fr>:
>>
>>> Hi Dmitry,
>>>
>>>
>>>  I'm attaching a patch that should fix the issue mentioned
above. It
>>>>
>>>> simply makes the same check seen in the same file for global
>>>> variables:
>>>>
>>>>     emitPTXAddressSpace(PTy->**getAddressSpace(), O);
>>>>     if (GVar->getAlignment() == 0)
>>>>       O << " .align " << (int)
TD->getPrefTypeAlignment(ETy);
>>>>     else
>>>>       O << " .align " <<
GVar->getAlignment();
>>>>
>>>
>>>
>>> it's not quite the same because your patch uses the ABI
alignment, while
>>> in this snippet it is the preferred alignment (which is usually the
same
>>> as the ABI alignment, but may be bigger).
>>>
>>>
>>>  Could you please review and commit? Do you think it needs a test
case?
>>>>
>>>
>>>
>>> Yes, it needs a testcase.
>>>
>>> Ciao, Duncan.
>>>
>>> ______________________________**_________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>>
>>
> ______________________________**_________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121109/a388799e/attachment.html>

Dmitry N. Mikushin

2012-Nov-09 21:58 UTC

head link

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Justin,

Very glad you're finally back! How's you PhD?

Please see the attached updated patch. Since you think ABI is better,
I'm only adding a test case.

- D.

2012/11/9 Justin Holewinski <justin.holewinski at
gmail.com>:> Test cases exist under test/CodeGen/NVPTX (name changed in May).  Now that
> I'm back at NVIDIA, I'm going to be running through the bugzilla
issues
> (thanks Dmitry for the reports!).  I have practically the exact same patch
> here in my queue. :)
>
> In this case, I would prefer ABI alignment for compatibility with the
vendor
> compiler.  It should work either way, but I do need to audit the codebase
> and tie up any issues here.
>
>
> On Fri, Nov 9, 2012 at 5:23 AM, Duncan Sands <baldrick at free.fr>
wrote:
>>
>> Hi Dmitry,
>>
>>
>>> You're right, global variables use preferred alignment. And -
yes,
>>> preferred alignment in this case is bigger: 8 instead of 4.
NVIDIA's
>>> prop. compiler gives 4. However, since CUDA 5.0 ptx modules are
>>> linkable with each other, I think alignments for externally visible
>>> functions and data should all follow ABI rules.
>>
>>
>> giving it an alignment of 8 does follow ABI rules: everything that is
>> 8 byte aligned has an address that is a multiple of 4, i.e. it is also
>> 4 byte aligned.
>>
>> It would be wrong to assume that external globals have the preferred
>> alignment: they can only be assumed to have the ABI alignment.  But
>> as we are aligning this variable we can give it the alignment we like
>> as long as it is at least 4.
>>
>> In short, I think using the preferred alignment is correct in this
>> context.
>>
>>
>>> Is there a guide on making tests? I have ~5 pending patches never
>>> accepted simply because I'm not familiar with LLVM test system
:-/
>>
>>
>> Your test would go in test/CodeGen/PTX/.  It should use FileCheck,
>> take a look at test/CodeGen/X86/zext-trunc.ll for an example.  As
>> there are no PTX tests at all for the moment, you need to create a
>> lit.local.cfg file in PTX/.  Imitate the X86 one.
>>
>> Ciao, Duncan.
>>
>>
>>>
>>> - D.
>>>
>>> 2012/11/9 Duncan Sands <baldrick at free.fr>:
>>>>
>>>> Hi Dmitry,
>>>>
>>>>
>>>>> I'm attaching a patch that should fix the issue
mentioned above. It
>>>>>
>>>>> simply makes the same check seen in the same file for
global
>>>>> variables:
>>>>>
>>>>>     emitPTXAddressSpace(PTy->getAddressSpace(), O);
>>>>>     if (GVar->getAlignment() == 0)
>>>>>       O << " .align " << (int)
TD->getPrefTypeAlignment(ETy);
>>>>>     else
>>>>>       O << " .align " <<
GVar->getAlignment();
>>>>
>>>>
>>>>
>>>> it's not quite the same because your patch uses the ABI
alignment, while
>>>> in this snippet it is the preferred alignment (which is usually
the same
>>>> as the ABI alignment, but may be bigger).
>>>>
>>>>
>>>>> Could you please review and commit? Do you think it needs a
test case?
>>>>
>>>>
>>>>
>>>> Yes, it needs a testcase.
>>>>
>>>> Ciao, Duncan.
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> --
>
> Thanks,
>
> Justin Holewinski
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: align0.patch
Type: application/octet-stream
Size: 898 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121109/e9b58ed3/attachment.obj>

Duncan Sands

2012-Nov-10 09:37 UTC

head link

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Justin,

On 09/11/12 22:49, Justin Holewinski wrote:> Test cases exist under test/CodeGen/NVPTX (name changed in May).
I've deleted the empty PTX directory.

   Now that I'm> back at NVIDIA, I'm going to be running through the bugzilla issues
(thanks
> Dmitry for the reports!).  I have practically the exact same patch here in
my
> queue. :)
>
> In this case, I would prefer ABI alignment for compatibility with the
vendor
> compiler.
I don't really understand this argument.  If the vendor compiler is aligning
to
4 (say) then some globals will have address a multiple of 4, some will have
address a multiple of 8, some will have address a multiple of 16 etc, depending
on the accidents of just where in memory they happen to be placed.  For
example, if you have two 4 byte globals that follow each other in memory, and
that are 4 byte aligned, then if the first one has address a multiple of 4 then
the second will have address a multiple of 8.  In short lots of variables will
be 8 byte aligned by accident.  If LLVM gives them all an alignment of 8, what
does that change?  OK, I will now admit that there is an effect if assumptions
are being made about globals being placed next to each other: if you declare
two globals A and B immediately after each other in the IR then the LLVM
semantics doesn't guarantee that they will be laid out one immediately after
the other in memory.  But that's how it happens in practice so maybe people
are (wrongly) relying on that.  Bumping up the alignment to a multiple of 8
may add extra padding between A and B, causing B to not be at the position that
such naughty people are expecting.

   It should work either way, but I do need to audit the codebase
and> tie up any issues here.
The IR optimizers already bump the alignment of some globals up to the
preferred alignment, check out enforceKnownAlignment in Local.cpp (it ends
up being called from instcombine).

Ciao, Duncan.

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Nov 2012 - [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Apparently Analagous Threads