thr3ads.net - freebsd stable - weird bugs with mmap-ing via NFS [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Mikhail Teterin

2006-Mar-21 22:18 UTC

weird bugs with mmap-ing via NFS

[Moved from -current to -stable]

???????? 21 ???????? 2006 16:23, Matthew Dillon ??
????????:> ? ? You might be doing just writes to the mmap()'d memory, but the
system
> ? ? doesn't know that.
Actually, it does. The program tells it, that I don't care to read,
what's
currently there, by specifying the PROT_READ flag only.
> ? ? The moment you touch any mmap()'d page, reading or writing, the
system
> ? ? has to fault it in, which means it has to read it and load valid data
> ? ? into the page.
Sounds like a missed optimization opportunity :-(
> :When I mount with large read and write sizes:
> :
> :???????mount_nfs -r 65536 -w 65536 -U -ointr pandora:/backup /backup
> :
> :it changes -- for the worse. Short time into it -- the file stops growing
> :according to the `ls -sl' run on the NFS server (pandora) at exactly
3200
> : FS blocks (the FS was created with `-b 65536 -f 8129').
> :
> :At the same time, according to `systat -if' on both client and server,
the
> : ? client continues to send (and the server continues to receive) about
> : 30Mb of some (?) data per second.
> ? ? It kinda sounds like the buffer cache is getting blown out, but not
> ? ? having seen the program I can't really analyze it.
See http://aldan.algebra.com/~mi/mzip.c
> ? ? It will always be more efficient to write to a file using write() then
> ? ? using mmap()
I understand, that write() is much better optimized at the moment, but the 
mmap interface carries some advantages, which may allow future OSes to 
optimize their ways. The application can hint at its planned usage of the 
data via madvise, for example.

Unfortunately, my problem, so far, is with it not writing _at all_...
> ? ? and it will always be far, far more efficient to write to an NFS file
in
> ? ? nfs block-sized chunks rather then in smaller chunks 
> ? ? due to the way the buffer cache works.
Yes, this is an example of how a good implemented mmap can be better than 
write. Without explicit writes by the application and without doubling the 
memory requirements, the data can be written in the most optimal way.

Thanks for your help. Yours,

	-mi

Matthew Dillon

2006-Mar-21 22:48 UTC

head link

weird bugs with mmap-ing via NFS

:
:	[Moved from -current to -stable]
:
:×¦×ÔÏÒÏË 21 ÂÅÒÅÚÅÎØ 2006 16:23, Matthew Dillon ÷É ÎÁÐÉÓÁÌÉ:
:> š š You might be doing just writes to the mmap()'d memory, but the
system
:> š š doesn't know that.
:
:Actually, it does. The program tells it, that I don't care to read,
what's
:currently there, by specifying the PROT_READ flag only.

    That's an architectural flag.  Very few architectures actually support
    write-only memory maps.  IA32 does not.  It does not change the
    fact that the operating system must validate the memory underlying
    the page, nor does it imply that the system shouldn't.

:Sounds like a missed optimization opportunity :-(

    Even on architectures that did support write-only memory maps, the
    system would still have to fault in the rest of the data on the page,
    because the system would have no way of knowing which bytes in the 
    page you wrote to (that is, whether you wrote to all the bytes in the
    page or whether you left gaps).  The system does not take a fault for
    every write you issue to the page, only for the first one.  So, no 
    matter how you twist it, the system *MUST* validate the entire page
    when it takes the page fault.

:> š š It kinda sounds like the buffer cache is getting blown out, but not
:> š š having seen the program I can't really analyze it.
:
:See http://aldan.algebra.com/~mi/mzip.c

    I can't access this URL, it says 'not found'.

:> š š It will always be more efficient to write to a file using write() then
:> š š using mmap()
:
:I understand, that write() is much better optimized at the moment, but the 
:mmap interface carries some advantages, which may allow future OSes to 
:optimize their ways. The application can hint at its planned usage of the 
:data via madvise, for example.

    Yes, but those advantages are limited by the way memory mapping hardware
    works.  There are some things that simply cannot be optimized through
    lack of sufficient information.

    Reading via mmap() is very well optimized.  Making modifications via
    mmap() is optimized insofar as the expectation that the data is intended
    to be read, modified, and written back.  It is not possible to
    optimize with the expectation that the data would only be written to
    the mmap, for the reasons described above.  The hardware simply does not
    provide sufficient information to the operating system to optimize 
    the write-only case.

:Unfortunately, my problem, so far, is with it not writing _at all_...

    Not sure what is going on since I can't access the program yet, but
    I'd be happy to take a look at the code.

    The most common mistake people make when trying to write to a file via
    mmap() is that they forget to ftruncate() the file to the proper length
    first.  Mapped memory beyond the file's EOF is ignored within the last
    page, and the program will take a page fault if it tries to write to
    mapped pages that are entire beyond the file's current EOF.  Writing
    to mapped memory does *not* extend the size of a file.  Only 
    ftruncate() or write() can extend the size of a file.

    The second most common mistake is to forget to specify MAP_SHARED
    in the mmap() call.

:Yes, this is an example of how a good implemented mmap can be better than 
:write. Without explicit writes by the application and without doubling the 
:memory requirements, the data can be written in the most optimal way.
:...
:Thanks for your help. Yours,
:
:	-mi

    I don't think mmap()-based writing will EVER be more efficient then
    write() except in the case where the entire data set fits into memory
    and has been entirely cached by the system.  In that one case writing via
    mmap will be faster.  In all other cases the system will be taking as
    many VM faults on the pages as it would be taking system call faults
    to execute the write()'s.

    You are making a classic mistake by assuming that the copying overhead
    of a write() into the file's backing store, verses directly mmap()ing
    the file's backing store, represents a large chunk of the overhead for
    the operation.  In fact, the copying overhead represents only a small
    chunk of the related overhead.  The vast majority of the overhead is
    always going to be the disk I/O itself.

    I/O must occur even in the cached/delayed-write case so on a busy system
    it still represents the greatest overhead from the point of view of
    system load.  On a lightly loaded system nobody is going to care about
    a few milliseconds of improved performance here and there since, by 
    definition, the system is lightly loaded and thus has plenty of idle
    cpu and I/O cycles to spare.
	
					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

Maybe Matching Threads

Search for more possibly parallel threads

freebsd stable - Mar 2006 - weird bugs with mmap-ing via NFS

weird bugs with mmap-ing via NFS

weird bugs with mmap-ing via NFS

Maybe Matching Threads