Rui Ueyama via llvm-dev
2016-Jun-06 19:17 UTC
[llvm-dev] LLD: Using sendfile(2) to copy file contents
On Mon, Jun 6, 2016 at 12:11 PM, Matt Godbolt <matt at godbolt.org> wrote:> > On Mon, Jun 6, 2016 at 1:41 PM Rui Ueyama via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> As to leave an opportunity for the kernel, I think mmap+write would be >> enough. Because the kernel knows what address is mmap'ed, it can detect >> that write's source is actually a mmap'ed file and if that's the case it >> can optimize as it does for sendfile. It seems that Linux doesn't do that >> now, though. >> > > Pardon my ignorance here, but how might the kernel in general know what > the "source" of a write is? >The kernel knows where all mmap'ed files are mapped. So, it can decides whether a memory address is in a mmap'ed region or not, no? Also, in terms of the async_io option, in my (non-llvm) experimentation> with reading very large files the aio subsystem is not well-supported or > optimized (hence the lack of Glibc support). >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/3bc5d1e0/attachment.html>
Matt Godbolt via llvm-dev
2016-Jun-06 19:20 UTC
[llvm-dev] LLD: Using sendfile(2) to copy file contents
Perhaps I misunderstand: the kernel can tell (upon a page fault) what memory address is being written to, and likewise upon a page fault which memory has been read from. But it can't put these things together to infer "process A is reading from X and writing to Y": it sees the reads and writes in isolation, and indeed only at page granularity and when a page fault happens. On Mon, Jun 6, 2016 at 2:18 PM Rui Ueyama <ruiu at google.com> wrote:> On Mon, Jun 6, 2016 at 12:11 PM, Matt Godbolt <matt at godbolt.org> wrote: > >> >> On Mon, Jun 6, 2016 at 1:41 PM Rui Ueyama via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> As to leave an opportunity for the kernel, I think mmap+write would be >>> enough. Because the kernel knows what address is mmap'ed, it can detect >>> that write's source is actually a mmap'ed file and if that's the case it >>> can optimize as it does for sendfile. It seems that Linux doesn't do that >>> now, though. >>> >> >> Pardon my ignorance here, but how might the kernel in general know what >> the "source" of a write is? >> > > The kernel knows where all mmap'ed files are mapped. So, it can decides > whether a memory address is in a mmap'ed region or not, no? > > Also, in terms of the async_io option, in my (non-llvm) experimentation >> with reading very large files the aio subsystem is not well-supported or >> optimized (hence the lack of Glibc support). >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/7cacf483/attachment.html>
Rui Ueyama via llvm-dev
2016-Jun-06 19:34 UTC
[llvm-dev] LLD: Using sendfile(2) to copy file contents
What I mean is doing something like this. int in = open(infile, O_RDONLY, 0); int out = open(outfile, O_WRONLY | O_CREAT, 0644); void *buf = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, in, 0); write(out, buf, filesize); By write, I mean write system call and not a general read or write. On Mon, Jun 6, 2016 at 12:20 PM, Matt Godbolt <matt at godbolt.org> wrote:> Perhaps I misunderstand: the kernel can tell (upon a page fault) what > memory address is being written to, and likewise upon a page fault which > memory has been read from. But it can't put these things together to infer > "process A is reading from X and writing to Y": it sees the reads and > writes in isolation, and indeed only at page granularity and when a page > fault happens. > > On Mon, Jun 6, 2016 at 2:18 PM Rui Ueyama <ruiu at google.com> wrote: > >> On Mon, Jun 6, 2016 at 12:11 PM, Matt Godbolt <matt at godbolt.org> wrote: >> >>> >>> On Mon, Jun 6, 2016 at 1:41 PM Rui Ueyama via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> As to leave an opportunity for the kernel, I think mmap+write would be >>>> enough. Because the kernel knows what address is mmap'ed, it can detect >>>> that write's source is actually a mmap'ed file and if that's the case it >>>> can optimize as it does for sendfile. It seems that Linux doesn't do that >>>> now, though. >>>> >>> >>> Pardon my ignorance here, but how might the kernel in general know what >>> the "source" of a write is? >>> >> >> The kernel knows where all mmap'ed files are mapped. So, it can decides >> whether a memory address is in a mmap'ed region or not, no? >> >> Also, in terms of the async_io option, in my (non-llvm) experimentation >>> with reading very large files the aio subsystem is not well-supported or >>> optimized (hence the lack of Glibc support). >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160606/78782f74/attachment.html>