thr3ads.net - zfs discuss - [zfs-discuss] How does ZFS snapshot COW file data? [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Eric Hamilton

2007-Jul-03 22:57 UTC

[zfs-discuss] How does ZFS snapshot COW file data?

Apologies in advance for the newbie internals question, but could someone please
give me a pointer to how ZFS snapshots cause future modifications to files to be
written to different disk blocks?  I''m looking at OpenSolaris NV bld
66.

How do snapshots interact with open files or files with pages in the OpenSolaris
page cache?  And what are the effects of O_DSYNC on snapshot consistency of open
files?  My general understanding is that ZFS always writes to new locations
(which makes snapshot simple), but does that apply to data pages too?  Does that
mean that paging out dirty mmap pages go to new places and require metadata
updates as well?

I''ve found the ZFS tour and documentation and opengrok helpful. 
I''ve got cscope built locally, and I''ll dig it out eventually,
but I thought perhaps a kind soul could give me a pointer and maybe others will
learn something interesting too.

Eric
 
 
This message posted from opensolaris.org

Darren Dunham

2007-Jul-03 23:54 UTC

head link

[zfs-discuss] How does ZFS snapshot COW file data?

> Apologies in advance for the newbie internals question, but could
> someone please give me a pointer to how ZFS snapshots cause future
> modifications to files to be written to different disk blocks? 
I''m
> looking at OpenSolaris NV bld 66.
"snapshots" don''t really cause that.  ZFS never overwrites
data, so all
writes are to "different" disk blocks.  Data in a file is never
overwritten directly.  

Once this is true, then the creation of snapshots is easier, but the two
are separate.
> How do snapshots interact with open files or files with pages in the
> OpenSolaris page cache?
I don''t believe they do.  Are you thinking of something in particular?
> My general understanding is that ZFS always writes to new locations
> (which makes snapshot simple), but does that apply to data pages too?
All data and metadata (except for the uberblock dance).
> Does that mean that paging out dirty mmap pages go to new places and
> require metadata updates as well?
Yes.
-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Eric Hamilton

2007-Jul-05 22:29 UTC

head link

[zfs-discuss] How does ZFS snapshot COW file data?

Thanks, Darren.

I''ve taken the liberty of reordering my follow-up for clarity.  Just to
be clear, I''m not critiquing ZFS, just trying to learn it by pushing at
some of the corner cases of filesystem-system interactions.  I''m trying
to figure out the implications of various ZFS features when used in 
various ways.>> Does that mean that paging out dirty mmap pages go to new places and
>> require metadata updates as well?
>>     
> Yes.Indeed, given that ZFS always writes new places (except for uberblock as 
you noted), then that does make the snapshots easier and accounts for 
there being no explicit code to "set COW on disk blocks", or such.

 From a ufs/vxfs background, the thought of allocating additional disk 
storage to page out to a memory mapped file or of changing metadata to 
point to new blocks and hence needing to write out metadata as part of 
paging out modified file data seems foreign to me.  If the metadata 
writes can be bundled into the same transaction, perhaps there''s no
more
serialization latency on a pageout... ?  Is this also another case where 
one might get ENOSPC when one doesn''t on other filesystems (paging out 
to an existing MMF in a full ZFS pool)?

 From the comments in the source tour about the ZIL,  I did note the 
statement that file contents do not go through the ZIL unless needed for 
O_DSYNC or fsynch() semantics, so I wasn''t sure how else they might be 
different.>> How do snapshots interact with open files or files with pages in the
>> OpenSolaris page cache?
>>     
>
> I don''t believe they do.  Are you thinking of something in
particular?
>   I am generally interested in understanding file consistency and cache 
coherency.  I''d like to know what exactly is being snapshotted and what
is consistent within a snapshot.

I subsequently saw an earlier thread on "ZFS consistency guarantee"  
(http://www.opensolaris.org/jive/thread.jspa;?messageID=124809) where 
you and others pointed out that application state is not consistent at a 
snapshot unless the application has been quiesced or otherwise brought 
to a consistent state.  Even then, I''m curious about the interaction 
with the OpenSolaris page cache...

As is generally  known and is explained well by Roch Bourbonnais in 
http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine, NFS places an extra 
requirement for committing writes to stable storage upon file close.  
For local filesystems, a close() will complete without all modified file 
data being written to disk yet.  Does all such file data get into a 
snapshot, or only data that has happened to be pushed out to disk by the 
time of the snapshot?  (e.g. local open(), write(), close(), snapshot).

It does look to me from the comment and call to zil_suspend from within 
dmu_objset_snapshot_one that any changes that have made it to the 
filesystem will get flushed out and included in the snapshot.  This 
should apply to any metadata operations that have completed such as 
link, unlink, etc.  But if the VM system is caching file contents after 
a close (or at least nobody has pushed it out yet), is there any way to 
guarantee that such data makes it into the snapshot?

In my earlier question I was also thinking about about other cases such 
as MMF where application has written to page with or without msync()  or 
open file after write() but no fsync().   Depending upon how data of 
closed files are handled, those may be moot.

Eric






-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070705/d4d6f917/attachment.html>

Darren Dunham

2007-Jul-05 23:06 UTC

head link

[zfs-discuss] How does ZFS snapshot COW file data?

> >> Does that mean that paging out dirty mmap pages go to new places
and
> >> require metadata updates as well?
> >>     
> > Yes.
> Indeed, given that ZFS always writes new places (except for uberblock as 
> you noted), then that does make the snapshots easier and accounts for 
> there being no explicit code to "set COW on disk blocks", or
such.
> 
>  From a ufs/vxfs background, the thought of allocating additional disk 
> storage to page out to a memory mapped file or of changing metadata to 
> point to new blocks and hence needing to write out metadata as part of 
> paging out modified file data seems foreign to me.  If the metadata 
> writes can be bundled into the same transaction, perhaps there''s
no more
> serialization latency on a pageout... ?
That''s not something that I''ve tested at all.  Perhaps you
could
investigate?
> Is this also another case where 
> one might get ENOSPC when one doesn''t on other filesystems (paging
out
> to an existing MMF in a full ZFS pool)?
That depends on how you''re using snapshots.  Traditionally, they are a
way of overcommiting disk space to multiple uses in the hope that
everything fits.  Note that because of the snapshot guarantee, it is the
one that wins in a conflict, not the live filesystem.  

If you need to avoid that, you can use quotas and make sure that your
snapshots plus actual filesystem space does not exceed the pool
capacity.  Then you are not overcommitted, and space should be available
for writes.  Or you can just monitor everything very closely so that
you''re never near the edge.
>  From the comments in the source tour about the ZIL,  I did note the 
> statement that file contents do not go through the ZIL unless needed for 
> O_DSYNC or fsynch() semantics, so I wasn''t sure how else they
might be
> different.
> >> How do snapshots interact with open files or files with pages in
the
> >> OpenSolaris page cache?
> >>     
> >
> > I don''t believe they do.  Are you thinking of something in
particular?
> >   
> I am generally interested in understanding file consistency and cache 
> coherency.  I''d like to know what exactly is being snapshotted and
what
> is consistent within a snapshot.
>
> I subsequently saw an earlier thread on "ZFS consistency
guarantee"
> (http://www.opensolaris.org/jive/thread.jspa;?messageID=124809) where 
> you and others pointed out that application state is not consistent at a 
> snapshot unless the application has been quiesced or otherwise brought 
> to a consistent state.  Even then, I''m curious about the
interaction
> with the OpenSolaris page cache...
> 
> As is generally  known and is explained well by Roch Bourbonnais in 
> http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine, NFS places an extra 
> requirement for committing writes to stable storage upon file close.  
> For local filesystems, a close() will complete without all modified file 
> data being written to disk yet.  Does all such file data get into a 
> snapshot, or only data that has happened to be pushed out to disk by the 
> time of the snapshot?  (e.g. local open(), write(), close(), snapshot).
I have no knowlege of what the code in that area does.  My assumption
has been that the equivalent of an fsync() is done prior to the
snapshot.  Someone else will have to comment.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

zfs discuss - Jul 2007 - How does ZFS snapshot COW file data?

[zfs-discuss] How does ZFS snapshot COW file data?

[zfs-discuss] How does ZFS snapshot COW file data?

[zfs-discuss] How does ZFS snapshot COW file data?

[zfs-discuss] How does ZFS snapshot COW file data?