Apologies in advance for the newbie internals question, but could someone please give me a pointer to how ZFS snapshots cause future modifications to files to be written to different disk blocks? I''m looking at OpenSolaris NV bld 66. How do snapshots interact with open files or files with pages in the OpenSolaris page cache? And what are the effects of O_DSYNC on snapshot consistency of open files? My general understanding is that ZFS always writes to new locations (which makes snapshot simple), but does that apply to data pages too? Does that mean that paging out dirty mmap pages go to new places and require metadata updates as well? I''ve found the ZFS tour and documentation and opengrok helpful. I''ve got cscope built locally, and I''ll dig it out eventually, but I thought perhaps a kind soul could give me a pointer and maybe others will learn something interesting too. Eric This message posted from opensolaris.org
> Apologies in advance for the newbie internals question, but could > someone please give me a pointer to how ZFS snapshots cause future > modifications to files to be written to different disk blocks? I''m > looking at OpenSolaris NV bld 66."snapshots" don''t really cause that. ZFS never overwrites data, so all writes are to "different" disk blocks. Data in a file is never overwritten directly. Once this is true, then the creation of snapshots is easier, but the two are separate.> How do snapshots interact with open files or files with pages in the > OpenSolaris page cache?I don''t believe they do. Are you thinking of something in particular?> My general understanding is that ZFS always writes to new locations > (which makes snapshot simple), but does that apply to data pages too?All data and metadata (except for the uberblock dance).> Does that mean that paging out dirty mmap pages go to new places and > require metadata updates as well?Yes. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Thanks, Darren. I''ve taken the liberty of reordering my follow-up for clarity. Just to be clear, I''m not critiquing ZFS, just trying to learn it by pushing at some of the corner cases of filesystem-system interactions. I''m trying to figure out the implications of various ZFS features when used in various ways.>> Does that mean that paging out dirty mmap pages go to new places and >> require metadata updates as well? >> > Yes.Indeed, given that ZFS always writes new places (except for uberblock as you noted), then that does make the snapshots easier and accounts for there being no explicit code to "set COW on disk blocks", or such. From a ufs/vxfs background, the thought of allocating additional disk storage to page out to a memory mapped file or of changing metadata to point to new blocks and hence needing to write out metadata as part of paging out modified file data seems foreign to me. If the metadata writes can be bundled into the same transaction, perhaps there''s no more serialization latency on a pageout... ? Is this also another case where one might get ENOSPC when one doesn''t on other filesystems (paging out to an existing MMF in a full ZFS pool)? From the comments in the source tour about the ZIL, I did note the statement that file contents do not go through the ZIL unless needed for O_DSYNC or fsynch() semantics, so I wasn''t sure how else they might be different.>> How do snapshots interact with open files or files with pages in the >> OpenSolaris page cache? >> > > I don''t believe they do. Are you thinking of something in particular? >I am generally interested in understanding file consistency and cache coherency. I''d like to know what exactly is being snapshotted and what is consistent within a snapshot. I subsequently saw an earlier thread on "ZFS consistency guarantee" (http://www.opensolaris.org/jive/thread.jspa;?messageID=124809) where you and others pointed out that application state is not consistent at a snapshot unless the application has been quiesced or otherwise brought to a consistent state. Even then, I''m curious about the interaction with the OpenSolaris page cache... As is generally known and is explained well by Roch Bourbonnais in http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine, NFS places an extra requirement for committing writes to stable storage upon file close. For local filesystems, a close() will complete without all modified file data being written to disk yet. Does all such file data get into a snapshot, or only data that has happened to be pushed out to disk by the time of the snapshot? (e.g. local open(), write(), close(), snapshot). It does look to me from the comment and call to zil_suspend from within dmu_objset_snapshot_one that any changes that have made it to the filesystem will get flushed out and included in the snapshot. This should apply to any metadata operations that have completed such as link, unlink, etc. But if the VM system is caching file contents after a close (or at least nobody has pushed it out yet), is there any way to guarantee that such data makes it into the snapshot? In my earlier question I was also thinking about about other cases such as MMF where application has written to page with or without msync() or open file after write() but no fsync(). Depending upon how data of closed files are handled, those may be moot. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070705/d4d6f917/attachment.html>
> >> Does that mean that paging out dirty mmap pages go to new places and > >> require metadata updates as well? > >> > > Yes. > Indeed, given that ZFS always writes new places (except for uberblock as > you noted), then that does make the snapshots easier and accounts for > there being no explicit code to "set COW on disk blocks", or such. > > From a ufs/vxfs background, the thought of allocating additional disk > storage to page out to a memory mapped file or of changing metadata to > point to new blocks and hence needing to write out metadata as part of > paging out modified file data seems foreign to me. If the metadata > writes can be bundled into the same transaction, perhaps there''s no more > serialization latency on a pageout... ?That''s not something that I''ve tested at all. Perhaps you could investigate?> Is this also another case where > one might get ENOSPC when one doesn''t on other filesystems (paging out > to an existing MMF in a full ZFS pool)?That depends on how you''re using snapshots. Traditionally, they are a way of overcommiting disk space to multiple uses in the hope that everything fits. Note that because of the snapshot guarantee, it is the one that wins in a conflict, not the live filesystem. If you need to avoid that, you can use quotas and make sure that your snapshots plus actual filesystem space does not exceed the pool capacity. Then you are not overcommitted, and space should be available for writes. Or you can just monitor everything very closely so that you''re never near the edge.> From the comments in the source tour about the ZIL, I did note the > statement that file contents do not go through the ZIL unless needed for > O_DSYNC or fsynch() semantics, so I wasn''t sure how else they might be > different. > >> How do snapshots interact with open files or files with pages in the > >> OpenSolaris page cache? > >> > > > > I don''t believe they do. Are you thinking of something in particular? > > > I am generally interested in understanding file consistency and cache > coherency. I''d like to know what exactly is being snapshotted and what > is consistent within a snapshot. > > I subsequently saw an earlier thread on "ZFS consistency guarantee" > (http://www.opensolaris.org/jive/thread.jspa;?messageID=124809) where > you and others pointed out that application state is not consistent at a > snapshot unless the application has been quiesced or otherwise brought > to a consistent state. Even then, I''m curious about the interaction > with the OpenSolaris page cache... > > As is generally known and is explained well by Roch Bourbonnais in > http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine, NFS places an extra > requirement for committing writes to stable storage upon file close. > For local filesystems, a close() will complete without all modified file > data being written to disk yet. Does all such file data get into a > snapshot, or only data that has happened to be pushed out to disk by the > time of the snapshot? (e.g. local open(), write(), close(), snapshot).I have no knowlege of what the code in that area does. My assumption has been that the equivalent of an fsync() is done prior to the snapshot. Someone else will have to comment. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >