thr3ads.net - zfs discuss - [zfs-discuss] ZFS, COW, write(2), directIO... [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Manoj Joseph

2005-Dec-21 15:48 UTC

[zfs-discuss] ZFS, COW, write(2), directIO...

Hi ZFS Team,

I have a couple of questions...

Assume that the maximum slab size that ZFS supports is x. (I am assuming 
there is a maximum.) An application does a (single) write(2) for 2x 
bytes. Does ZFS/COW guarantee that either all the 2x bytes are 
persistent or none at all? Consider a case where there is a panic after 
x bytes has gone to disk and the change propagated to the uber block. Do 
the uber-block and metadata blocks get updated with the entire write(2) 
or nothing?

In other words, does ZFS''s ''always consistent on
disk'' guarantee extend
to data as well as metadata.

I am _not_ talking about cases that result in ENOSPC, EFBIG, EDQUOT etc. 
where I think a partial write is probably ok. But I would be interested 
in knowing ZFS''s behaviour in these cases as well.

The other question is about how ZFS does direct IO. Does it do COW for 
direct IO as well?

Referring me to a manual/doc is good enough. :)

Thanks in advance!!

Cheers
Manoj

Casper.Dik at Sun.COM

2005-Dec-21 16:50 UTC

head link

[zfs-discuss] ZFS, COW, write(2), directIO...

>Assume that the maximum slab size that ZFS supports is x. (I am assuming 
>there is a maximum.) An application does a (single) write(2) for 2x 
>bytes. Does ZFS/COW guarantee that either all the 2x bytes are 
>persistent or none at all? Consider a case where there is a panic after 
>x bytes has gone to disk and the change propagated to the uber block. Do 
>the uber-block and metadata blocks get updated with the entire write(2) 
>or nothing?
>
>In other words, does ZFS''s ''always consistent on
disk'' guarantee extend
>to data as well as metadata.
An important guarantee ZFS makes is that the data is consitent; that is
a guarantee that ufs doesn''t make: it makes somewhat sure that the meta
data does not contain errors.

Everybody knows that "meta data consistent" buys you next to nothing;
well,
fsck doesn''t fail but that is about it.  How often haven''t you
seen
filed updated just prior to a crash with bogus content?  Specifically
bad if it happens to /etc/*_* files.

ZFS consistency guarantee would not be worth much if it did not
group the meta data and data in the same transaction so that is
what it does.

Casper

Manoj Joseph

2005-Dec-21 17:04 UTC

head link

[zfs-discuss] ZFS, COW, write(2), directIO...

Hi Casper,

Thanks for your quick reply. Some followup questions. :)

Casper.Dik at Sun.COM wrote:>>Assume that the maximum slab size that ZFS supports is x. (I am assuming
>>there is a maximum.) An application does a (single) write(2) for 2x 
>>bytes. Does ZFS/COW guarantee that either all the 2x bytes are 
>>persistent or none at all? Consider a case where there is a panic after 
>>x bytes has gone to disk and the change propagated to the uber block. Do
>>the uber-block and metadata blocks get updated with the entire write(2) 
>>or nothing?
>>
>>In other words, does ZFS''s ''always consistent on
disk'' guarantee extend
>>to data as well as metadata.
> 
> 
> An important guarantee ZFS makes is that the data is consitent; that is
> a guarantee that ufs doesn''t make: it makes somewhat sure that the
meta
> data does not contain errors.
> 
> Everybody knows that "meta data consistent" buys you next to
nothing; well,
> fsck doesn''t fail but that is about it.  How often
haven''t you seen
> filed updated just prior to a crash with bogus content?  Specifically
> bad if it happens to /etc/*_* files.
> 
> ZFS consistency guarantee would not be worth much if it did not
> group the meta data and data in the same transaction so that is
> what it does.I thought so too. ;)

man write(2) says it can return with less than or euqal to
''nbyte''.
Can ZFS do this too - write less than what you asked it to write(2)?
Can this happen only when it runs out of space?

Writing less that what you ask the FS to write (but >0) gives you 
inconsistent data - at least IMHO.

Regards,
Manoj

Casper.Dik at Sun.COM

2005-Dec-21 17:14 UTC

head link

[zfs-discuss] ZFS, COW, write(2), directIO...

>man write(2) says it can return with less than or euqal to
''nbyte''.
>Can ZFS do this too - write less than what you asked it to write(2)?
>Can this happen only when it runs out of space?
Generally, this only happens for devices and not files.

But the manual write(2) is pretty clear:

     If a  write() requests that more bytes be written than there
     is  room for-for example, if the write would exceed the pro-
     cess file size limit (see getrlimit(2) and  ulimit(2)),  the
     system file size limit, or the free space on the device-only
     as many bytes as there is room  for  will  be  written.  For
     example,  suppose there is space for 20 bytes more in a file
     before reaching a limit. A write() of 512-bytes returns  20.
     The  next  write()  of  a  non-zero  number of bytes gives a
     failure return (except as noted for pipes and FIFO below).
>Writing less that what you ask the FS to write (but >0) gives you 
>inconsistent data - at least IMHO.
ZFS deals with filesystem inconsistencies only, not application
level inconsistencies.  The application knows that the condition has
arisen and can take the necessary steps to rectify the problem.

The POSIX syscall interface is a given and we cannot change its
behaviour.

Casper

Jeff Bonwick

2005-Dec-22 03:12 UTC

head link

[zfs-discuss] ZFS, COW, write(2), directIO...

> man write(2) says it can return with less than or euqal to
''nbyte''.
> Can ZFS do this too - write less than what you asked it to write(2)?
> Can this happen only when it runs out of space?
Yes -- any filesystem will do that if it runs out of space.

Regarding atomicity of writes: they''re only atomic up to a point.
If some application issues a 1TB write, we can''t hold up the rest
of the system waiting for it to complete.  At present, ZFS writes
are atomic up to the whole-block level, i.e. a max of 128k.

If it were useful, we could add a dataset property that indicates
how much stuff we should be willing to batch up in a single tx.
However, there does have to be some limit -- otherwise any ordinary
user could cork up the system by issuing a giant write, thereby
forcing ZFS to accumulate change until the system ran out of memory.

Jeff

Apparently Analagous Threads

Search for more reasonably related threads

zfs discuss - Dec 2005 - ZFS, COW, write(2), directIO...

[zfs-discuss] ZFS, COW, write(2), directIO...

[zfs-discuss] ZFS, COW, write(2), directIO...

[zfs-discuss] ZFS, COW, write(2), directIO...

[zfs-discuss] ZFS, COW, write(2), directIO...

[zfs-discuss] ZFS, COW, write(2), directIO...

Apparently Analagous Threads