Shixiong Xu via llvm-dev
2018-Jun-01 21:14 UTC
[llvm-dev] [VPlan] about vectorization factor selection
Hi, Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range. for (unsigned VF = 1; VF <= MaxVF; VF *= 2) { // Collect Uniform and Scalar instructions after vectorization with VF. CM.collectUniformsAndScalars(VF); // Collect the instructions (and their associated costs) that will be more // profitable to scalarize. if (VF > 1) CM.collectInstsToScalarize(VF); } It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don't look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment. bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; // Ignore scalar width, because the user explicitly wants vectorization. if (ForceVectorization && MaxVF > 1) { Width = 2; Cost = expectedCost(Width).first / (float)Width; } for (unsigned i = 2; i <= MaxVF; i *= 2) { // Notice that the vector loop needs to be executed less times, so // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(i); float VectorCost = C.first / (float)i; Cheers, Shixiong (Jason) Xu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180601/ec49697c/attachment.html>
Caballero, Diego via llvm-dev
2018-Jun-05 16:45 UTC
[llvm-dev] [VPlan] about vectorization factor selection
Hi Xu, Thanks for pointing this out and sorry for the delayed response.> It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don't look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs.I'm not sure I understand your proposal. Please, note that uniform values may vary from VF to VF. For example, a branch condition can be uniform (and be kept scalar) for VF=4 but can be divergent for VF=8. For that reason we have to compute this information for every potential VF since it can be different. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment. Agreed. It would be great if you can submit a patch for that or open a bug. I guess that we just need to introduce a MinVF that is set to 4 when vectorization is forced. Thanks! Diego From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Shixiong Xu via llvm-dev Sent: Friday, June 1, 2018 2:14 PM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] [VPlan] about vectorization factor selection Hi, Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range. for (unsigned VF = 1; VF <= MaxVF; VF *= 2) { // Collect Uniform and Scalar instructions after vectorization with VF. CM.collectUniformsAndScalars(VF); // Collect the instructions (and their associated costs) that will be more // profitable to scalarize. if (VF > 1) CM.collectInstsToScalarize(VF); } It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don't look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment. bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; // Ignore scalar width, because the user explicitly wants vectorization. if (ForceVectorization && MaxVF > 1) { Width = 2; Cost = expectedCost(Width).first / (float)Width; } for (unsigned i = 2; i <= MaxVF; i *= 2) { // Notice that the vector loop needs to be executed less times, so // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(i); float VectorCost = C.first / (float)i; Cheers, Shixiong (Jason) Xu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180605/4b5fe6b6/attachment.html>
Shixiong Xu via llvm-dev
2018-Jun-05 17:15 UTC
[llvm-dev] [VPlan] about vectorization factor selection
Hi Diego, Thanks for your reply. :) Shixiong From: Caballero, Diego <diego.caballero at intel.com> Sent: Tuesday, June 05, 2018 5:45 PM To: Shixiong Xu <shixiong at cadence.com> Cc: llvm-dev at lists.llvm.org Subject: RE: [VPlan] about vectorization factor selection EXTERNAL MAIL Hi Xu, Thanks for pointing this out and sorry for the delayed response.> It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don't look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs.I'm not sure I understand your proposal. Please, note that uniform values may vary from VF to VF. For example, a branch condition can be uniform (and be kept scalar) for VF=4 but can be divergent for VF=8. For that reason we have to compute this information for every potential VF since it can be different. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment. Agreed. It would be great if you can submit a patch for that or open a bug. I guess that we just need to introduce a MinVF that is set to 4 when vectorization is forced. Thanks! Diego From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Shixiong Xu via llvm-dev Sent: Friday, June 1, 2018 2:14 PM To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> Subject: [llvm-dev] [VPlan] about vectorization factor selection Hi, Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range. for (unsigned VF = 1; VF <= MaxVF; VF *= 2) { // Collect Uniform and Scalar instructions after vectorization with VF. CM.collectUniformsAndScalars(VF); // Collect the instructions (and their associated costs) that will be more // profitable to scalarize. if (VF > 1) CM.collectInstsToScalarize(VF); } It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don't look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment. bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; // Ignore scalar width, because the user explicitly wants vectorization. if (ForceVectorization && MaxVF > 1) { Width = 2; Cost = expectedCost(Width).first / (float)Width; } for (unsigned i = 2; i <= MaxVF; i *= 2) { // Notice that the vector loop needs to be executed less times, so // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(i); float VectorCost = C.first / (float)i; Cheers, Shixiong (Jason) Xu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180605/8552a8d5/attachment.html>