Neil Sharman
2007-Oct-02 05:27 UTC
[zfs-code] msync incorrectly returns EDQUOT for mmapped file on zfs for x86 build-72
Hi, While testing how my application behaves on zfs when the underlying file system fills up I found that the msync system call can return a EDQUOT. The documentation for msync does not indicate that it can return EDQUOT. I''ve created an example program that illustrates the failure. If you look at the program you will see that it creates a file, writes 16K to it, syncs the file. Subsequently it mmaps in the file and repeatedly msyncs while filling up the filesystem through the growth of a different file. Eventually one of the msyncs fails with EDQUOT, this failure seems bad and wrong. I also find it a little strange that it returns EDQUOT rather that ENOSPC, but lets not get into that. The program requires two file names. The program will create both of these files, the first will be 16K in size. The second will grow to fill the filesystem. Before running the test I suggest that a size limited very small zfs file system be created, and the test run there. Any comments or feedback welcome. Neil. -- This messages posted from opensolaris.org -------------- next part -------------- A non-text attachment was scrubbed... Name: zfs-msync.c Type: application/octet-stream Size: 1204 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071001/df359adf/attachment.obj>
Chris Kirby
2007-Oct-02 18:35 UTC
[zfs-code] msync incorrectly returns EDQUOT for mmapped file on zfs for x86 build-72
Neil Sharman wrote:> Hi, > > While testing how my application behaves on zfs when the underlying file system fills up I found that the msync system call can return a EDQUOT. The documentation for msync does not indicate that it can return EDQUOT. I''ve created an example program that illustrates the failure. > > If you look at the program you will see that it creates a file, writes 16K to it, syncs the file. Subsequently it mmaps in the file and repeatedly msyncs while filling up the filesystem through the growth of a different file. Eventually one of the msyncs fails with EDQUOT, this failure seems bad and wrong. > > I also find it a little strange that it returns EDQUOT rather that ENOSPC, but lets not get into that. >Neil, Is there a quota set on that filesystem (or on any of its ZFS ancestors)? When I run your test I get ENOSPC, which is to be expected. -Chris
Neil Sharman
2007-Oct-03 05:41 UTC
[zfs-code] msync incorrectly returns EDQUOT for mmapped file on zfs for x86 buil
Chris, Yes there is. That was the only way I could think of setting an upper bound on the file system size. Neil. -- This messages posted from opensolaris.org
Chris Kirby
2007-Oct-03 14:36 UTC
[zfs-code] msync incorrectly returns EDQUOT for mmapped file on zfs for x86 buil
Neil Sharman wrote:> Chris, > > Yes there is. That was the only way I could think of setting an upper > bound on the file system size.Ah, OK. That explains EDQUOT vs. ENOSPC. :-) We could translate those errors into EIO or something documented by the msync man page, but that''s less descriptive and potentially confusing. Perhaps we should consider updating the man page. -Chris
Neil Sharman
2007-Oct-04 03:35 UTC
[zfs-code] msync incorrectly returns EDQUOT for mmapped file on zfs for x86 buil
Chris,> We could translate those errors into EIO or something > documented by the msync man page, but that''s less > descriptive and potentially confusing. Perhaps we > should consider updating the man page.As a programmer I would much prefer to have the most descriptive errors possible. The thing that surprises me is that i get an error that seems to indicate that "I can''t use any MORE space". I am not trying to use any MORE space, I am just trying to overwrite the space that I have already been allocated from the file system. If I had created a file with holes in it then I could understand a msync generating an error when it tried to fill the holes. Although perhaps the mmap should have generated the error not the msync . . . not sure. It seems to me that this is a backwards incomparability. My program goes to a great deal of effort to deal with running out of disk space by checking all the writes, and dealing with errors. But it does very little checking on msync. The last thing I expected was to find that the disk space that had apparently already been allocated to me through writes had been mysteriously unallocated behind my back. I just had a thought. If I had not done the msync but had instead just exited my program would my changes to the mmap region have been written to disk by fsflush or would fsflush have failed to write my changes? If fsflush failed to write the changes would it have generated/logged an error? I had another thought. If I had fsynced rather than msynced would I have got an error? Note: I modified my program so that it didn''t use mmap. Instead after I had filled up the disk using the second file I then tried to overwrite the contents of the first using pwrite. These attempts to overwrite using pwrite also failed with EDQUOT. This just seems wrong. I''m not trying to get more disk space, i''m just trying to change the contents of the space that is already mine. Neil. -- This messages posted from opensolaris.org
Richard L. Hamilton
2007-Oct-11 10:18 UTC
[zfs-code] msync incorrectly returns EDQUOT for mmapped file on
> Chris, > > > We could translate those errors into EIO or > something > > documented by the msync man page, but that''s less > > descriptive and potentially confusing. Perhaps we > > should consider updating the man page. > > As a programmer I would much prefer to have the most > descriptive errors possible. > > The thing that surprises me is that i get an error > that seems to indicate that "I can''t use any > MORE space". I am not trying to use any MORE space, I > am just trying to overwrite the space > that I have already been allocated from the file > system. If I had created a file with holes in it then > I > could understand a msync generating an error when it > tried to fill the holes. Although perhaps the > mmap should have generated the error not the msync . > . . not sure. > > It seems to me that this is a backwards > incomparability. My program goes to a great deal of > effort > to deal with running out of disk space by checking > all the writes, and dealing with errors. But it > does very little checking on msync. The last thing I > expected was to find that the disk space that > had apparently already been allocated to me through > writes had been mysteriously unallocated behind my > back.Is there a snapshot in effect? If so, a write means the old blocks (as seen in the snapshot) have to be retained as well as the new ones; even if the file isn''t growing, the amount of space it uses grows. That could cause all sorts of syscalls to get ENOSPC or EDQUOT that don''t on other filesystems (and probably very rarely (give or take _very_ tight space) do on zfs if there is no snapshot holding the old blocks).> I just had a thought. If I had not done the msync but > had instead just exited my program would my > changes to the mmap region have been written to disk > by fsflush or would fsflush have failed to > write my changes? If fsflush failed to write the > changes would it have generated/logged an error?Seems to me that if you couldn''t do it, fsflush (or whatever; maybe zfs has its own mechanism?) wouldn''t be able to either. No idea if that sort of thing would get logged; even if it should, zfs adds situations where such things could occur that might not previously have arisen, so that might take either an expert or some major code-diving to figure out. Of course, if you don''t use msync()/fsync() (or with write(), some suitable combination of the O_*SYNC flags), your program would never really have any way of knowing whether your write succeeded (could be an I/O error on any fs type, if the device went bad). The possibility of heretofore unexpected ENOSPC/EDQUOT just means you have one more reason to do what you always should have done.> I had another thought. If I had fsynced rather than > msynced would I have got an error?I wouldn''t mix write()/fsync() with mmap()/msync(); it doesn''t necessarily work on all systems, and I think zfs sidesteps the usual page cache; it may still provide the same consistency between the two that other filesystems have historically provided on Solaris, but perhaps at some cost. At the very least, if that''s not a best practice, you may be on a code path less often taken, and thus less exercised. :-)> Note: I modified my program so that it didn''t use > mmap. Instead after I had filled up the > disk using the second file I then tried to overwrite > the contents of the first using pwrite. > These attempts to overwrite using pwrite also failed > with EDQUOT. This just seems wrong. > I''m not trying to get more disk space, i''m just > trying to change the contents of the space > that is already mine.Same situation as with the msync(), I think; namely that any sort of write (even to the inode, as in an fchmod() as mentioned in another thread) could fail under certain circumstances with ENOSPC/EDQUOT. Short of a new option to have snapshots get their own private quotas/reservations, I don''t see how one could have all the Good Things zfs offers and not have some such effect. -- This messages posted from opensolaris.org