thr3ads.net - llvm dev - [LLVMdev] Deduplication of memory allocation [May 2015]

If this information is useful, please help other people find it:
Share via:

François Fayard

2015-May-01 15:34 UTC

[LLVMdev] Deduplication of memory allocation

Hi,

Even though this question does not only concern LLVM, it seems that only
compilers guru can answer it. So I am giving a try here, hoping for the best.

In a recent talk by Chandler Carruth, “Performance with algorithms, efficiency
with data structures” ( https://www.youtube.com/watch?v=fHNmRkzxHWs
<https://www.youtube.com/watch?v=fHNmRkzxHWs> ), Chandler says that one
should never return by reference for efficiency. Although I totally agree with
him in most cases because pointers make it difficult for a compiler to optimise,
I still don’t always have an efficient solution with value semantics. Here is
the case that I am thinking of :

===std::vector<double> f(std::size_t i);

auto v = std::vector<double>( n );
for (std::size_t i = 0; i < 1000; ++i) {
    auto w = f(i);
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] += w[k];
    }
}
===
which would be way slower than

===void f(std::size_t i, std::vector<double>& w);

auto v = std::vector<double>( n );
auto w = std::vector<double>( n );
for (std::size_t i = 0; i < 1000; ++i) {
    f(i, w);
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] += w[k];
    }
}
===
because there is no memory allocation in the i-loop, inside the f-call. In the
Q&A where a guy seems to give him such an example (at 1:06:46), he says that
smart compilers such as LLVM can deduplicate memory allocation. It does not seem
to me to be applicable to this kind of algorithm. Does anyone have a concrete
example where a compiler deduplicates memory allocation?

Thanks,
François
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/bc562b00/attachment.html>

Daniel Berlin

2015-May-01 15:48 UTC

head link

[LLVMdev] Deduplication of memory allocation

On Fri, May 1, 2015 at 8:34 AM, François Fayard
<fayard.francois at icloud.com> wrote:> Hi,
>
> Even though this question does not only concern LLVM, it seems that only
> compilers guru can answer it. So I am giving a try here, hoping for the
> best.
>
> In a recent talk by Chandler Carruth, “Performance with algorithms,
> efficiency with data structures” (
> https://www.youtube.com/watch?v=fHNmRkzxHWs ), Chandler says that one
should
> never return by reference for efficiency. Although I totally agree with him
> in most cases because pointers make it difficult for a compiler to
optimise,
> I still don’t always have an efficient solution with value semantics. Here
> is the case that I am thinking of :
>
> ===> std::vector<double> f(std::size_t i);
>
> auto v = std::vector<double>( n );
> for (std::size_t i = 0; i < 1000; ++i) {
>     auto w = f(i);
>     for (std::size_t k = 0; k < v.size(); ++k) {
>         v[k] += w[k];
>     }
> }
> ===>
> which would be way slower than
>
> ===> void f(std::size_t i, std::vector<double>& w);
>
> auto v = std::vector<double>( n );
> auto w = std::vector<double>( n );
> for (std::size_t i = 0; i < 1000; ++i) {
>     f(i, w);
>     for (std::size_t k = 0; k < v.size(); ++k) {
>         v[k] += w[k];
>     }
> }
Why?

Look up "Return value optimization".
:)

François Fayard

2015-May-01 16:01 UTC

head link

[LLVMdev] Deduplication of memory allocation

Of course, the allocation only occurs only once in every loop in the first
example, inside the f function. We have 2 copy elision happening: one when the
function returns and one when w is copy constructed. But we still have one
memory allocation inside f.

With reference semantics, we don’t have a single memory allocation in the
i-loop.
> On 01 May 2015, at 17:48, Daniel Berlin <dberlin at dberlin.org>
wrote:
> Why?
> Look up "Return value optimization".
> :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/a3586fa3/attachment.html>

David Blaikie

2015-May-01 16:06 UTC

head link

[LLVMdev] Deduplication of memory allocation

On Fri, May 1, 2015 at 8:34 AM, François Fayard <fayard.francois at
icloud.com>
wrote:
> Hi,
>
> Even though this question does not only concern LLVM, it seems that only
> compilers guru can answer it. So I am giving a try here, hoping for the
> best.
>
> In a recent talk by Chandler Carruth, “Performance with algorithms,
> efficiency with data structures” (
> https://www.youtube.com/watch?v=fHNmRkzxHWs ), Chandler says that one
> should never return by reference for efficiency. Although I totally agree
> with him in most cases because pointers make it difficult for a compiler to
> optimise, I still don’t always have an efficient solution with value
> semantics. Here is the case that I am thinking of :
>
> ===> std::vector<double> f(std::size_t i);
>
> auto v = std::vector<double>( n );
> for (std::size_t i = 0; i < 1000; ++i) {
>     auto w = f(i);
>     for (std::size_t k = 0; k < v.size(); ++k) {
>         v[k] += w[k];
>     }
> }
> ===>
> which would be way slower than
>
> ===> void f(std::size_t i, std::vector<double>& w);
>
> auto v = std::vector<double>( n );
> auto w = std::vector<double>( n );
> for (std::size_t i = 0; i < 1000; ++i) {
>     f(i, w);
>     for (std::size_t k = 0; k < v.size(); ++k) {
>         v[k] += w[k];
>     }
> }
> ===>
> because there is no memory allocation in the i-loop, inside the f-call. In
> the Q&A where a guy seems to give him such an example (at 1:06:46), he
says
> that smart compilers such as LLVM can deduplicate memory allocation. It
> does not seem to me to be applicable to this kind of algorithm. Does anyone
> have a concrete example where a compiler deduplicates memory allocation?
>
Probably the easiest way to give the compiler an opportunity here would be
to have 'f's definition available for inlining - a compiler might be
able
to inline that, then, depending on the particular allocation behavior of
'f', the allocation /may/ be reused (I don't think we do anything
like
reusing yet - though we can, to some degree, inline a bunch of
std::vector's ops and then just remove the whole std::vector thing in favor
of a stack allocation, for example)
>
> Thanks,
> François
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/95b7728a/attachment.html>

François Fayard

2015-May-01 16:24 UTC

head link

[LLVMdev] Deduplication of memory allocation

> On 01 May 2015, at 18:11, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
> There will be no allocation in the first example. it will transform
> the first into the equivalent of the second.
The following code is must faster with reference semantics than with value
semantics. The “value” code is not transformed into the “reference” code by
llvm. Just compile this code, and the difference in timings shows.

#include <iostream>
#include <vector>


std::vector<double> f_val(std::size_t i, std::size_t n) {
    auto v = std::vector<double>( n );
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] = static_cast<double>(k + i);
    }
    return v;
}

void f_ref(std::size_t i, std::vector<double>& v) {
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] = static_cast<double>(k + i);
    }
}

int main (int argc, char const *argv[])
{
    const auto n = std::size_t{10};

    {
        auto v = std::vector<double>( n, 0.0 );
        for (std::size_t i = 0; i < 100000000; ++i) {
            auto w = f_val(i, n);
            for (std::size_t k = 0; k < v.size(); ++k) {
                v[k] += w[k];
            }
        }
        std::cout << v[n - 1] << std::endl;
    }

    {
        auto v = std::vector<double>( n, 0.0 );
        auto w = std::vector<double>( n );
        for (std::size_t i = 0; i < 100000000; ++i) {
            f_ref(i, w);
            for (std::size_t k = 0; k < v.size(); ++k) {
                v[k] += w[k];
            }
        }
        std::cout << v[n - 1] << std::endl;
    }

    return 0;
}

François Fayard

2015-May-01 16:25 UTC

head link

[LLVMdev] Deduplication of memory allocation

> Probably the easiest way to give the compiler an opportunity here would be
to have 'f's definition available for inlining - a compiler might be
able to inline that, then, depending on the particular allocation behavior of
'f', the allocation /may/ be reused (I don't think we do anything
like reusing yet - though we can, to some degree, inline a bunch of
std::vector's ops and then just remove the whole std::vector thing in favor
of a stack allocation, for example)

Do you have any concrete example where std::vector is finally allocated on the
stack?

Daniel Berlin

2015-May-01 16:27 UTC

head link

[LLVMdev] Deduplication of memory allocation

On Fri, May 1, 2015 at 9:24 AM, François Fayard
<fayard.francois at icloud.com> wrote:>
>> On 01 May 2015, at 18:11, Daniel Berlin <dberlin at dberlin.org>
wrote:
>>
>> There will be no allocation in the first example. it will transform
>> the first into the equivalent of the second.
>
> The following code is must faster with reference semantics than with value
semantics. The “value” code is not transformed into the “reference” code by
>llvm.
This is a question of "what does your current version of clang and llvm do.
Not a question of what can be done.
It *can*, it may not do it right now, but it in fact, can.

> Just compile this code, and the difference in timings shows.
>
> #include <iostream>
> #include <vector>
>
>
> std::vector<double> f_val(std::size_t i, std::size_t n) {
>     auto v = std::vector<double>( n );
>     for (std::size_t k = 0; k < v.size(); ++k) {
>         v[k] = static_cast<double>(k + i);
>     }
>     return v;
> }
>
> void f_ref(std::size_t i, std::vector<double>& v) {
>     for (std::size_t k = 0; k < v.size(); ++k) {
>         v[k] = static_cast<double>(k + i);
>     }
> }
>
> int main (int argc, char const *argv[])
> {
>     const auto n = std::size_t{10};
>
>     {
>         auto v = std::vector<double>( n, 0.0 );
>         for (std::size_t i = 0; i < 100000000; ++i) {
>             auto w = f_val(i, n);
>             for (std::size_t k = 0; k < v.size(); ++k) {
>                 v[k] += w[k];
>             }
>         }
>         std::cout << v[n - 1] << std::endl;
>     }
>
>     {
>         auto v = std::vector<double>( n, 0.0 );
>         auto w = std::vector<double>( n );
>         for (std::size_t i = 0; i < 100000000; ++i) {
>             f_ref(i, w);
>             for (std::size_t k = 0; k < v.size(); ++k) {
>                 v[k] += w[k];
>             }
>         }
>         std::cout << v[n - 1] << std::endl;
>     }
>
>     return 0;
> }
>

llvm dev - May 2015 - [LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation

[LLVMdev] Deduplication of memory allocation