Displaying 2 results from an estimated 2 matches for "createloadclusterdagmut".
2018 Jun 21
2
NVPTX - Reordering load instructions
Hi all,
I'm looking into the performance difference of a benchmark compiled with
NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a
significant difference due to PTX instruction ordering. The relevant
source code consists of two nested loops that get fully unrolled, doing
some basic arithmetic with values loaded from shared memory:
> #define BLOCK_SIZE 16
>
>
2018 Jun 21
2
NVPTX - Reordering load instructions
...could make a custom pass, late IR or MI. You might also be able to
> use the existing instruction-scheduling infrastructure. You can
> implement ScheduleDAGMutation that does the clustering that you'd like,
> or if the existing ones do what you want, use those. We have preexisting
> createLoadClusterDAGMutation and createStoreClusterDAGMutation
> functions. If you look at AMDGPU/AMDGPUTargetMachine.cpp, you'll see
> calls like this:
>
> DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));
>
> and I think that you probably want to do the same.
>
>...