Hi, Even though this question does not only concern LLVM, it seems that only compilers guru can answer it. So I am giving a try here, hoping for the best. In a recent talk by Chandler Carruth, “Performance with algorithms, efficiency with data structures” ( https://www.youtube.com/watch?v=fHNmRkzxHWs <https://www.youtube.com/watch?v=fHNmRkzxHWs> ), Chandler says that one should never return by reference for efficiency. Although I totally agree with him in most cases because pointers make it difficult for a compiler to optimise, I still don’t always have an efficient solution with value semantics. Here is the case that I am thinking of : ===std::vector<double> f(std::size_t i); auto v = std::vector<double>( n ); for (std::size_t i = 0; i < 1000; ++i) { auto w = f(i); for (std::size_t k = 0; k < v.size(); ++k) { v[k] += w[k]; } } === which would be way slower than ===void f(std::size_t i, std::vector<double>& w); auto v = std::vector<double>( n ); auto w = std::vector<double>( n ); for (std::size_t i = 0; i < 1000; ++i) { f(i, w); for (std::size_t k = 0; k < v.size(); ++k) { v[k] += w[k]; } } === because there is no memory allocation in the i-loop, inside the f-call. In the Q&A where a guy seems to give him such an example (at 1:06:46), he says that smart compilers such as LLVM can deduplicate memory allocation. It does not seem to me to be applicable to this kind of algorithm. Does anyone have a concrete example where a compiler deduplicates memory allocation? Thanks, François -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/bc562b00/attachment.html>
On Fri, May 1, 2015 at 8:34 AM, François Fayard <fayard.francois at icloud.com> wrote:> Hi, > > Even though this question does not only concern LLVM, it seems that only > compilers guru can answer it. So I am giving a try here, hoping for the > best. > > In a recent talk by Chandler Carruth, “Performance with algorithms, > efficiency with data structures” ( > https://www.youtube.com/watch?v=fHNmRkzxHWs ), Chandler says that one should > never return by reference for efficiency. Although I totally agree with him > in most cases because pointers make it difficult for a compiler to optimise, > I still don’t always have an efficient solution with value semantics. Here > is the case that I am thinking of : > > ===> std::vector<double> f(std::size_t i); > > auto v = std::vector<double>( n ); > for (std::size_t i = 0; i < 1000; ++i) { > auto w = f(i); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] += w[k]; > } > } > ===> > which would be way slower than > > ===> void f(std::size_t i, std::vector<double>& w); > > auto v = std::vector<double>( n ); > auto w = std::vector<double>( n ); > for (std::size_t i = 0; i < 1000; ++i) { > f(i, w); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] += w[k]; > } > }Why? Look up "Return value optimization". :)
Of course, the allocation only occurs only once in every loop in the first example, inside the f function. We have 2 copy elision happening: one when the function returns and one when w is copy constructed. But we still have one memory allocation inside f. With reference semantics, we don’t have a single memory allocation in the i-loop.> On 01 May 2015, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > Why? > Look up "Return value optimization". > :)-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/a3586fa3/attachment.html>
On Fri, May 1, 2015 at 8:34 AM, François Fayard <fayard.francois at icloud.com> wrote:> Hi, > > Even though this question does not only concern LLVM, it seems that only > compilers guru can answer it. So I am giving a try here, hoping for the > best. > > In a recent talk by Chandler Carruth, “Performance with algorithms, > efficiency with data structures” ( > https://www.youtube.com/watch?v=fHNmRkzxHWs ), Chandler says that one > should never return by reference for efficiency. Although I totally agree > with him in most cases because pointers make it difficult for a compiler to > optimise, I still don’t always have an efficient solution with value > semantics. Here is the case that I am thinking of : > > ===> std::vector<double> f(std::size_t i); > > auto v = std::vector<double>( n ); > for (std::size_t i = 0; i < 1000; ++i) { > auto w = f(i); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] += w[k]; > } > } > ===> > which would be way slower than > > ===> void f(std::size_t i, std::vector<double>& w); > > auto v = std::vector<double>( n ); > auto w = std::vector<double>( n ); > for (std::size_t i = 0; i < 1000; ++i) { > f(i, w); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] += w[k]; > } > } > ===> > because there is no memory allocation in the i-loop, inside the f-call. In > the Q&A where a guy seems to give him such an example (at 1:06:46), he says > that smart compilers such as LLVM can deduplicate memory allocation. It > does not seem to me to be applicable to this kind of algorithm. Does anyone > have a concrete example where a compiler deduplicates memory allocation? >Probably the easiest way to give the compiler an opportunity here would be to have 'f's definition available for inlining - a compiler might be able to inline that, then, depending on the particular allocation behavior of 'f', the allocation /may/ be reused (I don't think we do anything like reusing yet - though we can, to some degree, inline a bunch of std::vector's ops and then just remove the whole std::vector thing in favor of a stack allocation, for example)> > Thanks, > François > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/95b7728a/attachment.html>
> On 01 May 2015, at 18:11, Daniel Berlin <dberlin at dberlin.org> wrote: > > There will be no allocation in the first example. it will transform > the first into the equivalent of the second.The following code is must faster with reference semantics than with value semantics. The “value” code is not transformed into the “reference” code by llvm. Just compile this code, and the difference in timings shows. #include <iostream> #include <vector> std::vector<double> f_val(std::size_t i, std::size_t n) { auto v = std::vector<double>( n ); for (std::size_t k = 0; k < v.size(); ++k) { v[k] = static_cast<double>(k + i); } return v; } void f_ref(std::size_t i, std::vector<double>& v) { for (std::size_t k = 0; k < v.size(); ++k) { v[k] = static_cast<double>(k + i); } } int main (int argc, char const *argv[]) { const auto n = std::size_t{10}; { auto v = std::vector<double>( n, 0.0 ); for (std::size_t i = 0; i < 100000000; ++i) { auto w = f_val(i, n); for (std::size_t k = 0; k < v.size(); ++k) { v[k] += w[k]; } } std::cout << v[n - 1] << std::endl; } { auto v = std::vector<double>( n, 0.0 ); auto w = std::vector<double>( n ); for (std::size_t i = 0; i < 100000000; ++i) { f_ref(i, w); for (std::size_t k = 0; k < v.size(); ++k) { v[k] += w[k]; } } std::cout << v[n - 1] << std::endl; } return 0; }
> Probably the easiest way to give the compiler an opportunity here would be to have 'f's definition available for inlining - a compiler might be able to inline that, then, depending on the particular allocation behavior of 'f', the allocation /may/ be reused (I don't think we do anything like reusing yet - though we can, to some degree, inline a bunch of std::vector's ops and then just remove the whole std::vector thing in favor of a stack allocation, for example)Do you have any concrete example where std::vector is finally allocated on the stack?
On Fri, May 1, 2015 at 9:24 AM, François Fayard <fayard.francois at icloud.com> wrote:> >> On 01 May 2015, at 18:11, Daniel Berlin <dberlin at dberlin.org> wrote: >> >> There will be no allocation in the first example. it will transform >> the first into the equivalent of the second.> > The following code is must faster with reference semantics than with value semantics. The “value” code is not transformed into the “reference” code by >llvm.This is a question of "what does your current version of clang and llvm do. Not a question of what can be done. It *can*, it may not do it right now, but it in fact, can.> Just compile this code, and the difference in timings shows.> > #include <iostream> > #include <vector> > > > std::vector<double> f_val(std::size_t i, std::size_t n) { > auto v = std::vector<double>( n ); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] = static_cast<double>(k + i); > } > return v; > } > > void f_ref(std::size_t i, std::vector<double>& v) { > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] = static_cast<double>(k + i); > } > } > > int main (int argc, char const *argv[]) > { > const auto n = std::size_t{10}; > > { > auto v = std::vector<double>( n, 0.0 ); > for (std::size_t i = 0; i < 100000000; ++i) { > auto w = f_val(i, n); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] += w[k]; > } > } > std::cout << v[n - 1] << std::endl; > } > > { > auto v = std::vector<double>( n, 0.0 ); > auto w = std::vector<double>( n ); > for (std::size_t i = 0; i < 100000000; ++i) { > f_ref(i, w); > for (std::size_t k = 0; k < v.size(); ++k) { > v[k] += w[k]; > } > } > std::cout << v[n - 1] << std::endl; > } > > return 0; > } >