thr3ads.net - llvm dev - [llvm-dev] [AMDGPU] Strange results with different address spaces [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Matt Arsenault via llvm-dev

2017-Dec-06 18:45 UTC

[llvm-dev] [AMDGPU] Strange results with different address spaces

> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at
uni-muenster.de> wrote:
> 
>  The IR goes through a backend agnostic preparation phase that brings it
into SSA from and changes the AS from 0 to 1.
This sounds possibly problematic to me. The IR should be created with the
correct address space to begin with. Changing this in the middle sounds suspect.
> After this phase the IR goes through another pass manager that performs O3
passes and the AMDGPU target passes for object file generation. I looked into
the AMDGPU backend and the only place where this metadata is added is in
AMDGPUAnnotateUniformValues.cpp. The pass queries dependency analysis for the
load and checks if it is reported as uniform. Afterwards the metadata is added
to the GEP.
>  
> Removing the O3 passes before code generation solves the problem so does
separating the O3 passes and the backend passes into separate pass managers. I
assume dependency analysis does not run in the second pass manager because no
metadata is generated at all.
>  
> Could this be a bug in DA reporting the load falsely as uniform by not
taking the intrinsics into account?
>  
> Cheers,
> Michael
>  
The intrinsics certainly are correctly treated as divergent. Nothing would work
otherwise. If I run the annotate pass or analysis on the examples it does the
right thing and sees the load as divergent.

$ opt -S -analyze -divergence -o - as1.ll
Printing analysis 'Divergence Analysis' for function
'_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_':
DIVERGENT:  %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11
DIVERGENT:  %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6
DIVERGENT:  %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64
DIVERGENT:  %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64
%idxprom.i.i.i
DIVERGENT:  %9 = load i32, i32 addrspace(1)* %8, align 4
DIVERGENT:  %10 = getelementptr [16 x i32], [16 x i32] addrspace(3)*
@"_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT__sm0", i32
0, i32 %6
DIVERGENT:  store i32 %9, i32 addrspace(3)* %10, align 4
DIVERGENT:  %11 = load i32, i32 addrspace(3)* %10, align 4
DIVERGENT:  %12 = getelementptr i32, i32 addrspace(1)* %callable.coerce1, i64
%idxprom.i.i.i
DIVERGENT:  store i32 %11, i32 addrspace(1)* %12, align 4

I’m also questioning how/where you obtained this dump. You have the declarations
for the control flow intrinsics in there, which should only ever appear when the
backend inserts them as part of codegen. There’s something suspicious about your
pass setup. What does the IR look like immediately before
AMDGPUAnnotateUniformValues, and immediately out of the frontend?

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171206/513a32de/attachment-0001.html>

Haidl, Michael via llvm-dev

2017-Dec-08 06:51 UTC

head link

[llvm-dev] [AMDGPU] Strange results with different address spaces

Hi Matt,

thanks for your response. I agree that the IR should be generated with the
correct AS in the first place. However, for my project this is somehow
impossible. I need the same IR with everything in AS 0 for CPU execution and
again with GPU specific address spaces to avoid performance impacts of the
generic address space. Doing this in the front-end means a way more intrusive
change to clang and a way I did not want to go in the first place.

I have the IR that goes into the pass manager attached to the mail.
The PM is set up as follows:

  llvm::TargetOptions options;
  options.UnsafeFPMath = false;
  options.NoInfsFPMath = false;
  options.NoNaNsFPMath = false;
  options.HonorSignDependentRoundingFPMathOption = false;
  options.AllowFPOpFusion = FPOpFusion::Fast;

  Triple TheTriple = Triple(M.getTargetTriple());
  std::string Error;
  SmallString<128> hsaString;
  llvm::raw_svector_ostream hsaOS(hsaString);
  if (!_target)
    _target = TargetRegistry::lookupTarget("amdgcn", TheTriple,
Error);
  if (!_target) {
    throw common::generic_exception(Error);
  }

  llvm::legacy::PassManager PM;
  PassManagerBuilder builder;
  builder.OptLevel = 3;
  builder.populateModulePassManager(PM);

  _machine.reset(_target->createTargetMachine(
      TheTriple.getTriple(), _cpu, _features, options, Reloc::Model::Static,
      CodeModel::Model::Medium, CodeGenOpt::Aggressive));

  if (_machine->addPassesToEmitFile(PM, hsaOS,
                                    TargetMachine::CGFT_ObjectFile, false)) {
    throw std::logic_error(
        "target does not support generation of this file type!\n");
  }

  PM.run(M);

The IR from the original post was dumped after PM has finished its work because
I could not figure out where the problem arises with just the IR before the PM
starts working. Running opt with -O3 on the IR does not change much in the IR
and on both versions of the IR I get the same output you did in your first
response.


DIVERGENT:  %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0
DIVERGENT:  %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6
DIVERGENT:  %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64
DIVERGENT:  %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64
%idxprom.i.i.i
DIVERGENT:  %9 = load i32, i32 addrspace(1)* %8, align 4
DIVERGENT:  %10 = getelementptr [16 x i32], [16 x i32] addrspace(3)*
@"_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT__sm0", i32
0, i32 %6
DIVERGENT:  store i32 %9, i32 addrspace(3)* %10, align 4
DIVERGENT:  %11 = load i32, i32 addrspace(3)* %10, align 4
DIVERGENT:  %12 = getelementptr i32, i32 addrspace(1)* %callable.coerce1, i64
%idxprom.i.i.i
DIVERGENT:  store i32 %11, i32 addrspace(1)* %12, align 4

I cannot see where these uniform access comes into play.

Cheers,
Michael


Von: Matt Arsenault [mailto:whatmannerofburgeristhis at gmail.com] Im Auftrag
von Matt Arsenault
Gesendet: Mittwoch, 6. Dezember 2017 19:45
An: Haidl, Michael <michael.haidl at uni-muenster.de>
Cc: tstellar at redhat.com; llvm-dev at lists.llvm.org
Betreff: Re: [llvm-dev] [AMDGPU] Strange results with different address spaces




On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at
uni-muenster.de<mailto:michael.haidl at uni-muenster.de>> wrote:

 The IR goes through a backend agnostic preparation phase that brings it into
SSA from and changes the AS from 0 to 1.

This sounds possibly problematic to me. The IR should be created with the
correct address space to begin with. Changing this in the middle sounds suspect.

After this phase the IR goes through another pass manager that performs O3
passes and the AMDGPU target passes for object file generation. I looked into
the AMDGPU backend and the only place where this metadata is added is in
AMDGPUAnnotateUniformValues.cpp. The pass queries dependency analysis for the
load and checks if it is reported as uniform. Afterwards the metadata is added
to the GEP.

Removing the O3 passes before code generation solves the problem so does
separating the O3 passes and the backend passes into separate pass managers. I
assume dependency analysis does not run in the second pass manager because no
metadata is generated at all.

Could this be a bug in DA reporting the load falsely as uniform by not taking
the intrinsics into account?

Cheers,
Michael


The intrinsics certainly are correctly treated as divergent. Nothing would work
otherwise. If I run the annotate pass or analysis on the examples it does the
right thing and sees the load as divergent.

$ opt -S -analyze -divergence -o - as1.ll
Printing analysis 'Divergence Analysis' for function
'_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_':
DIVERGENT:  %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11
DIVERGENT:  %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6
DIVERGENT:  %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64
DIVERGENT:  %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64
%idxprom.i.i.i
DIVERGENT:  %9 = load i32, i32 addrspace(1)* %8, align 4
DIVERGENT:  %10 = getelementptr [16 x i32], [16 x i32] addrspace(3)*
@"_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT__sm0", i32
0, i32 %6
DIVERGENT:  store i32 %9, i32 addrspace(3)* %10, align 4
DIVERGENT:  %11 = load i32, i32 addrspace(3)* %10, align 4
DIVERGENT:  %12 = getelementptr i32, i32 addrspace(1)* %callable.coerce1, i64
%idxprom.i.i.i
DIVERGENT:  store i32 %11, i32 addrspace(1)* %12, align 4

I’m also questioning how/where you obtained this dump. You have the declarations
for the control flow intrinsics in there, which should only ever appear when the
backend inserts them as part of codegen. There’s something suspicious about your
pass setup. What does the IR look like immediately before
AMDGPUAnnotateUniformValues, and immediately out of the frontend?

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171208/fb4c3dbb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: final.ll
Type: application/octet-stream
Size: 3090 bytes
Desc: final.ll
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171208/fb4c3dbb/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: final_O3.ll
Type: application/octet-stream
Size: 2868 bytes
Desc: final_O3.ll
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171208/fb4c3dbb/attachment-0001.obj>

Matt Arsenault via llvm-dev

2017-Dec-08 19:54 UTC

head link

[llvm-dev] [AMDGPU] Strange results with different address spaces

> On Dec 8, 2017, at 01:51, Haidl, Michael <michael.haidl at
uni-muenster.de> wrote:
> 
> Hi Matt, 
>  
> thanks for your response. I agree that the IR should be generated with the
correct AS in the first place. However, for my project this is somehow
impossible.
> I need the same IR with everything in AS 0 for CPU execution and again with
GPU specific address spaces to avoid performance impacts of the generic address
space.We have an optimization pass to eliminate generic accesses, so for the most part
you shouldn’t have to worry about this too much. You can insert casts to flat
and generally expect them to be eliminated. This is what HCC is doing now.

> Doing this in the front-end means a way more intrusive change to clang and
a way I did not want to go in the first place.
>  
> I have the IR that goes into the pass manager attached to the mail.
> The PM is set up as follows:
>  
>   llvm::TargetOptions options;
>   options.UnsafeFPMath = false;
>   options.NoInfsFPMath = false;
>   options.NoNaNsFPMath = false;
>   options.HonorSignDependentRoundingFPMathOption = false;
>   options.AllowFPOpFusion = FPOpFusion::Fast;
>  
>   Triple TheTriple = Triple(M.getTargetTriple());
>   std::string Error;
>   SmallString<128> hsaString;
>   llvm::raw_svector_ostream hsaOS(hsaString);
>   if (!_target)
>     _target = TargetRegistry::lookupTarget("amdgcn", TheTriple,
Error);
>   if (!_target) {
>     throw common::generic_exception(Error);
>   }
>  
>   llvm::legacy::PassManager PM;
>   PassManagerBuilder builder;
>   builder.OptLevel = 3;
>   builder.populateModulePassManager(PM);
>  
>   _machine.reset(_target->createTargetMachine(
>       TheTriple.getTriple(), _cpu, _features, options,
Reloc::Model::Static,
>       CodeModel::Model::Medium, CodeGenOpt::Aggressive));
>  
>   if (_machine->addPassesToEmitFile(PM, hsaOS,
>                                     TargetMachine::CGFT_ObjectFile, false))
{
>     throw std::logic_error(
>         "target does not support generation of this file
type!\n");
>   }
>  
>   PM.run(M);
>   <>I can’t see what’s going on with this. I would look into what happens when
AMDGPUTTIImpl::isSourceOfDivergence is called in your broken example. I don’t
see exactly how it would happen or how it would cause this, but I”m guessing
something went wrong where the wrong address space mapping is being used at some
point.

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171208/6687c062/attachment-0001.html>

llvm dev - Dec 2017 - [AMDGPU] Strange results with different address spaces

[llvm-dev] [AMDGPU] Strange results with different address spaces

[llvm-dev] [AMDGPU] Strange results with different address spaces

[llvm-dev] [AMDGPU] Strange results with different address spaces