Fangqing Du via llvm-dev
2021-Jul-30 17:23 UTC
[llvm-dev] which pass can do following optimization? gvn-sink?
Dear all, Imagine we have following code: 1 #define ny 10 2 #define Batch_Size 10 3 4 typedef float data_t; 5 6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]); 7 8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size], 9 data_t out[ny][Batch_Size]) { 10 11 data_t max[Batch_Size]; 12 13 SA_MAX2: 14 for (int i = 0; i < Batch_Size; i++) { 15 max[i] = 0; 16 SA_MAX1: 17 for (int j = 0; j < ny; j++) { 18 if (l_Z2[j][i] > max[i]) 19 max[i] = l_Z2[j][i]; 20 } 21 } 22 foo(out, max); 23 } we can see 'max[i]' is an invariant variable to loop 'SA_MAX1', so I want to know which pass can following following transformation/optimization: 1 #define ny 10 2 #define Batch_Size 10 3 4 typedef float data_t; 5 6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]); 7 8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size], 9 data_t out[ny][Batch_Size]) { 10 11 data_t max[Batch_Size]; 12 13 SA_MAX2: 14 for (int i = 0; i < Batch_Size; i++) { 15 data_t Max = 0; 16 SA_MAX1: 17 for (int j = 0; j < ny; j++) { 18 if (l_Z2[j][i] > Max) 19 Max = l_Z2[j][i]; 20 } 21 max[i] = Max; 22 } 23 foo(out, max); 24 } Which will use a local scalar 'Max' to replace the original 'max[i]', and sink the original write out of the loop 'SA_MAX1'. I did some experiment with godbolt, looks like currently we don't have such kind of optimization. https://godbolt.org/z/9PK3hYvPs Do you know which pass can do this? Or it's not necessary for CPU? Thanks, Fangqing Xilinx Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210730/d9c41805/attachment.html>
Michael Kruse via llvm-dev
2021-Jul-30 18:04 UTC
[llvm-dev] which pass can do following optimization? gvn-sink?
This kind optimization is done by the LICM pass. Look for promoteLoopAccessesToScalars in LICM.cpp. However, it requires the loop ocde to be executed unconditionally (or isSafeToExecuteUnconditionally). See the justification in the comment for promoteLoopAccessesToScalars. Michael Am Fr., 30. Juli 2021 um 12:23 Uhr schrieb Fangqing Du via llvm-dev <llvm-dev at lists.llvm.org>:> > Dear all, > > Imagine we have following code: > > 1 #define ny 10 > > 2 #define Batch_Size 10 > > 3 > > 4 typedef float data_t; > > 5 > > 6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]); > > 7 > > 8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size], > > 9 data_t out[ny][Batch_Size]) { > > 10 > > 11 data_t max[Batch_Size]; > > 12 > > 13 SA_MAX2: > > 14 for (int i = 0; i < Batch_Size; i++) { > > 15 max[i] = 0; > > 16 SA_MAX1: > > 17 for (int j = 0; j < ny; j++) { > > 18 if (l_Z2[j][i] > max[i]) > > 19 max[i] = l_Z2[j][i]; > > 20 } > > 21 } > > 22 foo(out, max); > > 23 } > > we can see 'max[i]' is an invariant variable to loop 'SA_MAX1', so I want to know which pass can following following transformation/optimization: > > 1 #define ny 10 > > 2 #define Batch_Size 10 > > 3 > > 4 typedef float data_t; > > 5 > > 6 void foo(data_t out[ny][Batch_Size], data_t max[Batch_Size]); > > 7 > > 8 void Softmax_Activation(data_t l_Z2[ny][Batch_Size], > > 9 data_t out[ny][Batch_Size]) { > > 10 > > 11 data_t max[Batch_Size]; > > 12 > > 13 SA_MAX2: > > 14 for (int i = 0; i < Batch_Size; i++) { > > 15 data_t Max = 0; > > 16 SA_MAX1: > > 17 for (int j = 0; j < ny; j++) { > > 18 if (l_Z2[j][i] > Max) > > 19 Max = l_Z2[j][i]; > > 20 } > > 21 max[i] = Max; > > 22 } > > 23 foo(out, max); > > 24 } > > Which will use a local scalar 'Max' to replace the original 'max[i]', and sink the original write out of the loop 'SA_MAX1'. > > I did some experiment with godbolt, looks like currently we don't have such kind of optimization. > https://godbolt.org/z/9PK3hYvPs > > Do you know which pass can do this? Or it's not necessary for CPU? > > Thanks, > Fangqing > Xilinx Inc. > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev