thr3ads.net - search: "createloadclusterdagmut"

Displaying 2 results from an estimated 2 matches for "createloadclusterdagmut".

2018 Jun 21

NVPTX - Reordering load instructions

Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > >

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

...could make a custom pass, late IR or MI. You might also be able to > use the existing instruction-scheduling infrastructure. You can > implement ScheduleDAGMutation that does the clustering that you'd like, > or if the existing ones do what you want, use those. We have preexisting > createLoadClusterDAGMutation and createStoreClusterDAGMutation > functions. If you look at AMDGPU/AMDGPUTargetMachine.cpp, you'll see > calls like this: > > DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI)); > > and I think that you probably want to do the same. > &gt...

search for: createloadclusterdagmut