thr3ads.net - llvm dev - [llvm-dev] LLD: Using sendfile(2) to copy file contents [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Rui Ueyama via llvm-dev

2016-Jun-06 18:41 UTC

[llvm-dev] LLD: Using sendfile(2) to copy file contents

As to leave an opportunity for the kernel, I think mmap+write would be
enough. Because the kernel knows what address is mmap'ed, it can detect
that write's source is actually a mmap'ed file and if that's the
case it
can optimize as it does for sendfile. It seems that Linux doesn't do that
now, though.

I haven't thought about using non-temporal stores. It may work as we copy
very large amount of data, but after copying data, we read it back in order
to apply relocations, so I'm not sure if it's going to be overall win.

Finally, as to asynchronous IO, I'm wondering if it's effective. It
seems
that not that many people are using asynchronous IO on Linux, and it is
often the case that minor paths are not optimized well. I agree that at
least in theory it could improve throughput, so it's worth a try.

On Mon, Jun 6, 2016 at 10:49 AM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> Thanks a lot for running the experiment. One thing I want to try one day
> is relocating one section at a time in anonymous memory and then using
> async io(io_submit) to write the final bits. That way the kernel can do io
> while we relocate other sections.
>
> Cheers,
> Rafael
> On Jun 5, 2016 4:19 PM, "Rui Ueyama via llvm-dev" <llvm-dev at
lists.llvm.org>
> wrote:
>
>> This is a short summary of an experiment that I did for the linker.
>>
>> One of the major tasks of the linker is to copy file contents from
input
>> object files to an output file. I was wondering what's the fastest
way to
>> copy data from one file to another, so I conducted an experiment.
>>
>> Currently, LLD copies file contents using memcpy (input files and an
>> output file are mapped to memory.) mmap+memcpy is not known as the
fastest
>> way to copy file contents.
>>
>> Linux has sendfile system call. The system call takes two file
>> descriptors and copies contents from one to another (it used to take
only a
>> socket as a destination, but these days it can take any file.) It is
>> usually much faster than memcpy to copy files. For example, it is about
3x
>> faster than cp command to copy large files on my machine (on SSD/ext4).
>>
>> I made a change to LLVM and LLD to use sendfile instead of memcpy to
copy
>> section contents. Here's the time to link clang with debug info.
>>
>>     memcpy: 12.96 seconds
>>     sendfile: 12.82 seconds
>>
>> sendfile(2) was slightly faster but not that much. But if you disable
>> string merging (by passing -O0 parameter to the linker), the difference
>> becomes noticeable.
>>
>>     memcpy: 7.85 seconds
>>     sendfile: 6.94 seconds
>>
>> I think it is because, with -O0, the linker has to copy more contents
>> than without -O0. It creates 2x larger executable than without -O0. As
the
>> amount of data the linker needs to copy gets larger, sendfile gets more
>> effective.
>>
>> By the way, gold takes 27.05 seconds to link it.
>>
>> With the results, I'm *not* going to submit that change. There are
two
>> reasons. First, the optimization seems too system-specific, and I'm
not yet
>> sure if it's always effective even on Linux. Second, the current
>> implementations of MemoryBuffer and FileOutputBuffer are not
>> sendfile(2)-friendly because they close file descriptors immediately
after
>> mapping them to memory. My patch is too hacky to submit.
>>
>> Being said that, the results clearly show that there's room for
future
>> optimization. I think we want to revisit it when we want to do a
low-level
>> optimization on link speed.
>>
>> Rui
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/407fd12c/attachment.html>

Matt Godbolt via llvm-dev

2016-Jun-06 19:11 UTC

head link

[llvm-dev] LLD: Using sendfile(2) to copy file contents

On Mon, Jun 6, 2016 at 1:41 PM Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> As to leave an opportunity for the kernel, I think mmap+write would be
> enough. Because the kernel knows what address is mmap'ed, it can detect
> that write's source is actually a mmap'ed file and if that's
the case it
> can optimize as it does for sendfile. It seems that Linux doesn't do
that
> now, though.
>
Pardon my ignorance here, but how might the kernel in general know what the
"source" of a write is?

Also, in terms of the async_io option, in my (non-llvm) experimentation
with reading very large files the aio subsystem is not well-supported or
optimized  (hence the lack of Glibc support).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/84f199f1/attachment.html>

Rui Ueyama via llvm-dev

2016-Jun-06 19:17 UTC

head link

[llvm-dev] LLD: Using sendfile(2) to copy file contents

On Mon, Jun 6, 2016 at 12:11 PM, Matt Godbolt <matt at godbolt.org> wrote:
>
> On Mon, Jun 6, 2016 at 1:41 PM Rui Ueyama via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> As to leave an opportunity for the kernel, I think mmap+write would be
>> enough. Because the kernel knows what address is mmap'ed, it can
detect
>> that write's source is actually a mmap'ed file and if
that's the case it
>> can optimize as it does for sendfile. It seems that Linux doesn't
do that
>> now, though.
>>
>
> Pardon my ignorance here, but how might the kernel in general know what
> the "source" of a write is?
>
The kernel knows where all mmap'ed files are mapped. So, it can decides
whether a memory address is in a mmap'ed region or not, no?

Also, in terms of the async_io option, in my (non-llvm)
experimentation> with reading very large files the aio subsystem is not well-supported or
> optimized  (hence the lack of Glibc support).
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/3bc5d1e0/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jun 2016 - LLD: Using sendfile(2) to copy file contents

[llvm-dev] LLD: Using sendfile(2) to copy file contents

[llvm-dev] LLD: Using sendfile(2) to copy file contents

[llvm-dev] LLD: Using sendfile(2) to copy file contents

Reasonably Related Threads