Brian Gesiak via llvm-dev
2019-Dec-02 16:00 UTC
[llvm-dev] [cfe-dev][RFC] Identifying wasteful template function bodies
I work on a C++ project for which compilation time is a significant concern. One of my colleagues was able to significantly shorten the time Clang took to compile our project, by manually outlining independently-typed code from large template functions. This makes intuitive sense to me, because when instantiating a template function, Clang traverses the body of the function. The longer the function body, the more nodes in the AST Clang has to traverse, and the more time it takes. Programmers can read the function and see that some statements in the function body remain the same no matter what types the function is instantiated with. By extracting these statements into a separate, non-template function, programmers can reduce the amount of nodes Clang must traverse. I created a contrived example that demonstrates how splitting up a long template function can improve compile time. (Beware, the files are large, I needed something that would take Clang a hefty amount of time to process.) https://gist.github.com/modocache/77b8ac09280c08bd88f84b92ff43a28b In the example above, 'example.cpp' defines a template function 'foo<T, U, V, W, X, Y, Z>', whose body is ~46k LoC. It then instantiates 'foo' 10 times, with 10 different combinations of template type parameters. In total, 'clang -c -O1 example.cpp -Xclang -disable-llvm-passes -Xclang -emit-llvm' takes ~35 seconds in total to compile. Each additional instantiation of 'foo' adds an additional ~3 seconds to the total compile time. Only the last statement in 'foo' is dependent upon the template type parameters to 'foo'. 'example-outlined.cpp' moves ~46k LoC of independently-typed statements out of 'foo' and into a function named 'foo_prologue_outlined', and has 'foo' call 'foo_prologue_outlined'. 'foo_prologue_outlined' is not a template function. The result is identical program behavior, but a total compile time of just ~5 seconds (~85% faster). Additional instantiations of 'foo' in 'example-outlined.cpp' cost almost no additional compile time. Although the functions in our project are not as long, some of them take significantly longer than 35 seconds to compile. By outlining independently-typed statements, we've been able to reduce compile time of some functions, from 300s to 200s (1/3rd faster). So, my colleagues and I are looking for other functions we can manually outline in order to reduce the amount of time Clang takes to compile our project. To this end, it would be handy if Clang could tell us, for example, “hey, I just instantiated 'bar<int, float, double>', but X% of the statements in that function did not require transformation,” where 'X%' is some threshold that could be set in the compiler invocation. For now I'm thinking the option to set this warning threshold could be called '-Wwasteful-template-threshold=' -- but I'm aware that sounds awkward, and I'd love suggestions for a better name. I think implementing this feature is possible by adding some state to TreeTransform, or the Clang template instantiators that derive from that class. But before I send a patch to do so, I'm curious if anyone has attempted such a thing before, or if anyone has thoughts or comments on this feature. I'd prefer not to spend time implementing this diagnostic in Clang if it's predestined to be rejected in code review, so please let me know what you think! (I've cc'ed some contributors who I think have worked in this space like @rnk, or those who might have better naming suggestions like @rtrieu.) - Brian Gesiak
Brian Gesiak via llvm-dev
2019-Dec-02 16:48 UTC
[llvm-dev] [cfe-dev][RFC] Identifying wasteful template function bodies
Oops, sorry, I meant to send this to cfe-dev! Looking forward to any and all advice/opinions, though. Thanks! - Brian On Mon, Dec 2, 2019 at 8:00 AM Brian Gesiak <modocache at gmail.com> wrote:> > I work on a C++ project for which compilation time is a significant > concern. One of my colleagues was able to significantly shorten the > time Clang took to compile our project, by manually outlining > independently-typed code from large template functions. > > This makes intuitive sense to me, because when instantiating a > template function, Clang traverses the body of the function. The > longer the function body, the more nodes in the AST Clang has to > traverse, and the more time it takes. Programmers can read the > function and see that some statements in the function body remain the > same no matter what types the function is instantiated with. By > extracting these statements into a separate, non-template function, > programmers can reduce the amount of nodes Clang must traverse. > > I created a contrived example that demonstrates how splitting up a > long template function can improve compile time. (Beware, the files > are large, I needed something that would take Clang a hefty amount of > time to process.) > https://gist.github.com/modocache/77b8ac09280c08bd88f84b92ff43a28b > > In the example above, 'example.cpp' defines a template function > 'foo<T, U, V, W, X, Y, Z>', whose body is ~46k LoC. It then > instantiates 'foo' 10 times, with 10 different combinations of > template type parameters. In total, 'clang -c -O1 example.cpp -Xclang > -disable-llvm-passes -Xclang -emit-llvm' takes ~35 seconds in total to > compile. Each additional instantiation of 'foo' adds an additional ~3 > seconds to the total compile time. > > Only the last statement in 'foo' is dependent upon the template type > parameters to 'foo'. 'example-outlined.cpp' moves ~46k LoC of > independently-typed statements out of 'foo' and into a function named > 'foo_prologue_outlined', and has 'foo' call 'foo_prologue_outlined'. > 'foo_prologue_outlined' is not a template function. The result is > identical program behavior, but a total compile time of just ~5 > seconds (~85% faster). Additional instantiations of 'foo' in > 'example-outlined.cpp' cost almost no additional compile time. > > Although the functions in our project are not as long, some of them > take significantly longer than 35 seconds to compile. By outlining > independently-typed statements, we've been able to reduce compile time > of some functions, from 300s to 200s (1/3rd faster). So, my colleagues > and I are looking for other functions we can manually outline in order > to reduce the amount of time Clang takes to compile our project. To > this end, it would be handy if Clang could tell us, for example, “hey, > I just instantiated 'bar<int, float, double>', but X% of the > statements in that function did not require transformation,” where > 'X%' is some threshold that could be set in the compiler invocation. > For now I'm thinking the option to set this warning threshold could be > called '-Wwasteful-template-threshold=' -- but I'm aware that sounds > awkward, and I'd love suggestions for a better name. > > I think implementing this feature is possible by adding some state to > TreeTransform, or the Clang template instantiators that derive from > that class. But before I send a patch to do so, I'm curious if anyone > has attempted such a thing before, or if anyone has thoughts or > comments on this feature. I'd prefer not to spend time implementing > this diagnostic in Clang if it's predestined to be rejected in code > review, so please let me know what you think! > > (I've cc'ed some contributors who I think have worked in this space > like @rnk, or those who might have better naming suggestions like > @rtrieu.) > > - Brian Gesiak