Robert Milkowski
2010-Aug-27 23:13 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
Hi, When I set readonly=on on a dataset then no new files are allowed to be created. However writes to already opened files are allowed. This is rather counter intuitive - if I set a filesystem as read-only I would expect it not to allow any modifications to it. I think it shouldn''t behave this way and it should be considered as a bug. What do you think? ps. I tested it on S10u8 and snv_134. -- Robert Milkowski http://milek.blogspot.com
Ian Collins
2010-Aug-28 00:05 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
On 08/28/10 11:13 AM, Robert Milkowski wrote:> Hi, > > When I set readonly=on on a dataset then no new files are allowed to > be created. > However writes to already opened files are allowed. > > This is rather counter intuitive - if I set a filesystem as read-only > I would expect it not to allow any modifications to it. > > I think it shouldn''t behave this way and it should be considered as a > bug. > > What do you think? >No. Think of this from the perspective of an application. How would write failure be reported? open(2) returns EACCES if the file can not be written but there isn''t a corresponding return from write(2). Any open file descriptors would have to be updated to reflect the change of access and the application would end up with an unexpected error return (EBADF?). If the application has been given permission to open a file for writing and this permission is unexpectedly revoked, strange things my happen. The file being written would be in an inconsistent state. I think it is better to let write operation complete and leave the file in a consistent state. -- Ian.
Ian Collins
2010-Aug-28 00:36 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
On 08/28/10 12:05 PM, Ian Collins wrote:> On 08/28/10 11:13 AM, Robert Milkowski wrote: >> Hi, >> >> When I set readonly=on on a dataset then no new files are allowed to >> be created. >> However writes to already opened files are allowed. >> >> This is rather counter intuitive - if I set a filesystem as read-only >> I would expect it not to allow any modifications to it. >> >> I think it shouldn''t behave this way and it should be considered as a >> bug. >> >> What do you think? >> > No. > > Think of this from the perspective of an application. How would write > failure be reported? open(2) returns EACCES if the file can not be > written but there isn''t a corresponding return from write(2). Any > open file descriptors would have to be updated to reflect the change > of access and the application would end up with an unexpected error > return (EBADF?). > > If the application has been given permission to open a file for > writing and this permission is unexpectedly revoked, strange things my > happen. The file being written would be in an inconsistent state. > > I think it is better to let write operation complete and leave the > file in a consistent state. >Following on from my own reply, I think that if there is a bug, it it letting the change occur when there are open files. Setting the filesystem read-only is effectively remounting it (the equivalent on other filesystems) so it should behave in the same way as an unmount in the presence of open files. -- Ian.
Nicolas Williams
2010-Aug-28 00:44 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
On Sat, Aug 28, 2010 at 12:05:53PM +1200, Ian Collins wrote:> Think of this from the perspective of an application. How would > write failure be reported? open(2) returns EACCES if the file can > not be written but there isn''t a corresponding return from write(2). > Any open file descriptors would have to be updated to reflect the > change of access and the application would end up with an unexpected > error return (EBADF?).EROFS. But write(2) isn''t supposed to return EROFS. NFSv3''s and v4''s write ops are allowed to return the NFS equivalent of EROFS, and so typically NFS clients do cause write(2) to return EROFS in such cases (but then, NFS isn''t fully POSIX). write(2) can return EIO though, and, IIRC, the BSD revoke(2) syscall arranges for just that to be returned by write(2) calls on revoked fildes. IMO EROFS and EIO would both be OK. It might be a good idea to require a force option to make a change that would cause non-POSIX behavior. I''d think that there''s many possible ways to handle this: a) disallow setting readonly=on on mounted datasets that are readonly=false; b) disallow ... but only if there are any fildes open for write (doesn''t matter if shared with NFS as NFS writes are allowed to return EROFS); c) allow the change but make it take effect on next mount; d) force umount the dataset, make the change, mount again; e) have write(2), to fildes open for write before the change to readonly=on, return EROFS after the change; f) same as (d) but only if you force the prop change; g) have write(2), to fildes open for write before the change to readonly=on, return EIO after the change; h) allow write(2)s to fildes open for write before the change to readonly=on; (h) is current behavior. (a) and (b) would be reasonable, but if EBUSY, the user may not be able to change the property without drastic steps (such as rebooting, if there''s lots of datasets below). (c) would be confusing, and not that useful. (d) would be unreasonable (plus what if there''s datasets below this one?!). (e)... may be reasonable if you think that we''re well outside POSIX the moment you change the readonly prop to on. (f) is reasonable (by forcing the change you''d be saying that you''re happy to leave POSIX land). (h) is reasonable.> If the application has been given permission to open a file for > writing and this permission is unexpectedly revoked, strange things > my happen. The file being written would be in an inconsistent > state.Well, there''s always the BSD revoke(2) system call. Use it and> I think it is better to let write operation complete and leave the > file in a consistent state.There is that too. But you could, too, just power off... The application should use fsync(2) (or fdatasync()) carefully to ensure that failed write(2)s and power failures don''t leave the application in an unrecoverable state. Nico --
Edward Ned Harvey
2010-Aug-28 00:45 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ian Collins > > > However writes to already opened files are allowed. > > Think of this from the perspective of an application. How would write > failure be reported?Both very good points. But I agree with Robert. write() has a known failure mode when disk is full. I agree bad things can happen to applications that attempt write() when disk is full ... however ... Only a user with root privs is able to set readonly property. I expect the root user is doing this for a reason. Willing, able, and aware to take responsibility for the consequences. The intuitive (generally expected) thing, when you''re root and you make a filesystem readonly, is that it becomes readonly. If that is not the behavior ... Well, I can think of at least one really specific, important example problem. Suppose an application writes to a file infinitely. Fills up the filesystem. This is a known bad thing for ZFS, sometimes causing unrecoverable infinite IO and forcing power-cycle (I don''t have a bug # but see here: http://opensolaris.org/jive/thread.jspa?threadID=132383&tstart=0 ) ... If you find yourself in the infinite IO, would-be-forced to power cycle situation, the workaround is to reduce some reservation to free up space. Then you should be able to rm, destroy, and stop scrub. But if the application is still infinitely writing to the open file handle that it already owns ... then any space you can free up will just get consumed again immediately by the bad application. Another specific example ... Suppose you "zfs send" from a primary server to a backup server. You want the filesystems to be readonly on the backup fileserver, in order to receive incrementals. If you make a mistake, and start writing to the backup server filesystem, you want to be able to correct your mistake. Make it readonly, stop anything from writing to it, rollback to the unmodified snapshot, so you''re able to receive incrementals again. If setting readonly doesn''t stop open filehandles from writing ... What can you do? You either have to flex your brain muscle to figure out some technique to find which application is performing writes (not always easy to do) or you basically have to unmount & remount the filesystem to force writes to stop, which might not be easy to do, because filehandles are in use. You might feel the need to simply reboot, instead of figuring out a way to do all this. You just complain to your colleagues and say "yeah, the stupid thing made me reboot in order to make the filesystem readonly."
Ian Collins
2010-Aug-28 00:56 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
On 08/28/10 12:45 PM, Edward Ned Harvey wrote:> Another specific example ... > > Suppose you "zfs send" from a primary server to a backup server. You want > the filesystems to be readonly on the backup fileserver, in order to receive > incrementals. If you make a mistake, and start writing to the backup server > filesystem, you want to be able to correct your mistake. Make it readonly, > stop anything from writing to it, rollback to the unmodified snapshot, so > you''re able to receive incrementals again. >I think you have lost a "not" in there somewhere! I always set all the backup filesystems on our staging sever read-only (and atime=off, it that makes any difference to a read-only filesystem). You can still receive to a read-only filesystem and there''s -F to force roll-backs. The exception is when adding a new nested filesystem; the mount will fail unless the parent is read/write. -- Ian.
Edward Ned Harvey
2010-Aug-28 00:58 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ian Collins > > so it should behave in the same way as an unmount in > the presence of open files.+1 You can unmount lazy, or force, or by default, the unmount fails in the presence of open files. (I think.) So to keep everybody happy, let people do whatever they want. ;-) Setting readonly property should fail in the presence of open files, or you can force it, which would truly sweep the rug out from under the writing processes. And if the developer(s) are feeling ambitious, implement lazy too. ;-)
Edward Ned Harvey
2010-Aug-28 01:04 UTC
[zfs-discuss] zfs set readonly=on does not entirely go into read-only mode
> From: Ian Collins [mailto:ian at ianshome.com] > > On 08/28/10 12:45 PM, Edward Ned Harvey wrote: > > Another specific example ... > > > > Suppose you "zfs send" from a primary server to a backup server. You > want > > the filesystems to be readonly on the backup fileserver, in order to > receive > > incrementals. If you make a mistake, and start writing to the backup > server > > filesystem, you want to be able to correct your mistake. Make it > readonly, > > stop anything from writing to it, rollback to the unmodified > snapshot, so > > you''re able to receive incrementals again. > > > > I think you have lost a "not" in there somewhere!Didn''t miss any "not," but it may not have been written clearly. If you *intended* to set the destination filesystem readonly before, and you only discovered it''s not readonly later, evident by the fact that something wrote to it and now you can''t receive incremental zfs snapshots... Then you want to correct your mistake. Whatever was writing to the backup fileserver, it shouldn''t have been. So set the filesystem readonly, rollback to the latest snapshot that corresponds to the primary server, so you can again start receiving incrementals.