Richard L. Hamilton
2007-Mar-28  10:18 UTC
[zfs-discuss] How big a write to a regular file is atomic?
and does it vary by filesystem type? I know I ought to know the answer, but it''s been a long time since I thought about it, and I must not be looking at the right man pages. And also, if it varies, how does one tell? For a pipe, there''s fpathconf() with _PC_PIPE_BUF, but how about for a regular file? This message posted from opensolaris.org
Manoj Joseph
2007-Mar-28  11:13 UTC
[zfs-discuss] How big a write to a regular file is atomic?
Richard L. Hamilton wrote:> and does it vary by filesystem type? I know I ought to know the > answer, but it''s been a long time since I thought about it, and > I must not be looking at the right man pages. And also, if it varies, > how does one tell? For a pipe, there''s fpathconf() with _PC_PIPE_BUF, > but how about for a regular file?For ZFS, it is atomic up to the whole-block level. See: http://www.opensolaris.org/jive/thread.jspa?messageID=18705䦂 -Manoj
Anton B. Rang
2007-Mar-29  01:55 UTC
[zfs-discuss] Re: How big a write to a regular file is atomic?
It''s not defined by POSIX (or Solaris). You can rely on being able to atomically write a single disk block (512 bytes); anything larger than that is risky. Oh, and it has to be 512-byte aligned. File systems with overwrite semantics (UFS, QFS, etc.) will never guarantee atomicity for more than a disk block, because that''s the only guarantee from the underlying disks. File systems with copy-on-write semantics (WAFL, ZFS, etc.) can guarantee atomicity for arbitrarily large writes, but will usually have some limits due to the desire to limit the amount of data which is modified in one transaction. For ZFS, writes to a single "ZFS block" will be atomic. I believe that a 128K write which does not cross a 128K boundary will always be atomic if you have not set a lower record size for the file system, but I haven''t studied the code in detail. This message posted from opensolaris.org
Nicolas Williams
2007-Mar-29  05:16 UTC
[zfs-discuss] Re: How big a write to a regular file is atomic?
On Wed, Mar 28, 2007 at 06:55:17PM -0700, Anton B. Rang wrote:> It''s not defined by POSIX (or Solaris). You can rely on being able to > atomically write a single disk block (512 bytes); anything larger than > that is risky. Oh, and it has to be 512-byte aligned. > > File systems with overwrite semantics (UFS, QFS, etc.) will never > guarantee atomicity for more than a disk block, because that''s the > only guarantee from the underlying disks.I thought UFS and others have a guarantee of atomicity for O_APPEND writes vis-a-vis other O_APPEND writes up to some write size. (Of course, NFS does not have true O_APPEND support, so this wouldn''t apply to NFS.) Nico --
Anton B. Rang
2007-Mar-29  05:33 UTC
[zfs-discuss] Re: How big a write to a regular file is atomic?
I should probably clarify my answer. All file systems provide writes by default which are atomic with respect to readers of the file. That''s a POSIX requirement. In other words, if you''re writing ABC, there''s no possibility that a reader might see ABD (if D was previously contained in the file) or just AB. POSIX doesn''t provide any means to change this behaviour, but individual file systems may; I believe direct I/O on UFS might, at least for overwrites, and QFS has a specific mount option (which can also be set on an individual file) to allow concurrent readers & writers. However, when considering atomicity with respect to a system crash, my previous comments stand. (This is the type of atomicity which is important for a database log file, for instance.) This message posted from opensolaris.org
Richard L. Hamilton
2007-Mar-30  10:32 UTC
[zfs-discuss] Re: Re: How big a write to a regular file is atomic?
> On Wed, Mar 28, 2007 at 06:55:17PM -0700, Anton B. > Rang wrote: > > It''s not defined by POSIX (or Solaris). You can > rely on being able to > > atomically write a single disk block (512 bytes); > anything larger than > > that is risky. Oh, and it has to be 512-byte > aligned. > > > > File systems with overwrite semantics (UFS, QFS, > etc.) will never > > guarantee atomicity for more than a disk block, > because that''s the > > only guarantee from the underlying disks. > > I thought UFS and others have a guarantee of > atomicity for O_APPEND > writes vis-a-vis other O_APPEND writes up to some > write size. (Of > course, NFS does not have true O_APPEND support, so > this wouldn''t apply > to NFS.)That''s mainly what I was thinking of, since the overwrite case would get more complicated. This message posted from opensolaris.org
can you guess?
2007-Apr-02  04:19 UTC
[zfs-discuss] Re: How big a write to a regular file is atomic?
> All file systems provide writes by default which are > atomic with respect to readers of the file. That''s a > POSIX requirement.Surely, only in the absence of a crash - otherwise, POSIX would require implementation of transactional write semantics in all file systems. - bill This message posted from opensolaris.org
Anton B. Rang
2007-Apr-02  22:27 UTC
[zfs-discuss] Re: How big a write to a regular file is atomic?
> > All file systems provide writes by default which are > > atomic with respect to readers of the file. > > Surely, only in the absence of a crash - otherwise, > POSIX would require implementation of transactional > write semantics in all file systems. Or is that what > you meant by the last sentence in your post?Well, that''s part of what I meant, but the actual POSIX requirement is that a read which occurs while a write is in progress cannot see a partially-completed write. A system crash would tend to stop any writes in progress, so this particular requirement doesn''t seem to apply. :-)> Update-in-place file systems can certainly support > large-than-disk-sector write atomicity - they just > have to use something like a transaction to do it.True ... I don''t know of any which do, but they could. Maybe I should have said "update-in-place file systems which do not also write data to a separate log" to be more accurate.... Anton This message posted from opensolaris.org
Nicolas Williams
2007-Apr-02  23:45 UTC
[zfs-discuss] Re: How big a write to a regular file is atomic?
On Mon, Apr 02, 2007 at 03:27:39PM -0700, Anton B. Rang wrote:> > > All file systems provide writes by default which are > > > atomic with respect to readers of the file. > > > > Surely, only in the absence of a crash - otherwise, > > POSIX would require implementation of transactional > > write semantics in all file systems. Or is that what > > you meant by the last sentence in your post? > > Well, that''s part of what I meant, but the actual POSIX requirement is > that a read which occurs while a write is in progress cannot see a > partially-completed write. A system crash would tend to stop any > writes in progress, so this particular requirement doesn''t seem to > apply. :-)Richard seems to care about the O_APPEND case. Can you address that? IIRC O_APPEND writes are atomic w.r.t. each other up to some write size (meaning that for larger append writes two appends can end up interleaving in chunks of the atomic append write size limit). Nico --