search for: createloadclusterdagmut

Displaying 2 results from an estimated 2 matches for "createloadclusterdagmut".

2018 Jun 21
2
NVPTX - Reordering load instructions
Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > >
2018 Jun 21
2
NVPTX - Reordering load instructions
...could make a custom pass, late IR or MI. You might also be able to > use the existing instruction-scheduling infrastructure. You can > implement ScheduleDAGMutation that does the clustering that you'd like, > or if the existing ones do what you want, use those. We have preexisting > createLoadClusterDAGMutation and createStoreClusterDAGMutation > functions. If you look at AMDGPU/AMDGPUTargetMachine.cpp, you'll see > calls like this: > > DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI)); > > and I think that you probably want to do the same. > &gt...