Hello all, I recently heard an argument from a colleague that "ZFS mis-uses the term COW" (Copy-On-Write). According to him, the original term was introduced by some vendors and was to be taken literally: that is, whenever a new write comes to update an existing logical block in the storage, the block''s old contents are first copied away to another physical location (i.e. to be used for snapshotting or for recovery of untimely poweroff/panic), then the original on-disk location is rewritten with the new data. Arguably, while this incurs a hit when rewriting existing data, this combats fragmentation and speeds up reads (i.e. all pieces of the file''s "live" version are stored as contiguously as possible). This may be important for large objects randomly updated "inside", like VM disk images and iSCSI backing stores, precreated database table files, maybe swapfiles, etc. I understand why ZFS does what it does, and how, but it may be possible that such subtle differences in terminology may cause misunderstanding between people of the same trade. At least, I''d keep this possibility in mind when talking to non-Solaris storage admins ;) I wonder if this use of the term is indeed more valid (making a copy of old data upon a new write), and if any vendors actually did that procedure outlined above? Thanks, //Jim Klimov
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jim Klimov > > I recently heard an argument from a colleague that "ZFS mis-uses > the term COW" (Copy-On-Write). According to him, the original term > was introduced by some vendors and was to be taken literally: that > is, whenever a new write comes to update an existing logical block > in the storage, the block''s old contents are first copied away to > another physical location (i.e. to be used for snapshotting or for > recovery of untimely poweroff/panic), then the original on-disk > location is rewritten with the new data.What you described (actually copying the disk sectors upon request to overwrite the disk sectors) is what MS does. It may seem more intuitive to call this COW, in a "files" perspective, but COW is a computer science term that was used in memory before it was ever used for disk. The ZFS behavior follows the traditional meaning of COW in regards to memory management. http://en.wikipedia.org/wiki/Copy-on-write> Arguably, while this incurs a hit when rewriting existing data, > this combats fragmentation and speeds up reads (i.e. all pieces of > the file''s "live" version are stored as contiguously as possible). > This may be important for large objects randomly updated "inside", > like VM disk images and iSCSI backing stores, precreated database > table files, maybe swapfiles, etc.Correct. Pay now or pay later. In some cases, pay now is better for the long run, and in some cases, pay later is better for the long run.
On Tue, Jun 5, 2012 at 6:32 AM, Jim Klimov <jimklimov at cos.ru> wrote:> ?I recently heard an argument from a colleague that "ZFS mis-uses > the term COW" (Copy-On-Write). According to him, the original term > was introduced by some vendors and was to be taken literally: that > is, whenever a new write comes to update an existing logical block > in the storage, the block''s old contents are first copied away to > another physical location (i.e. to be used for snapshotting or for > recovery of untimely poweroff/panic), then the original on-disk > location is rewritten with the new data.This is what I have seen "traditional" filesystems (UFS, VxFS) do in when dealing with snapshots. Once a snapshot is taken, for any data that is being re-written, a copy of the original must be made before committing the write.> ?Arguably, while this incurs a hit when rewriting existing data,The hit to write performance can be substantial and the space to store each snapshot''s data can also be large. This is one of the big differences between ZFS and others. The cost (both write performance and space) for snapshots in ZFS is minimal while for traditional filesystems it can be huge (depending on the number of snapshots).> this combats fragmentation and speeds up reads (i.e. all pieces of > the file''s "live" version are stored as contiguously as possible).As long as the file has not grown beyond the original allocation segment. Once you grow out of that you are (usually) fragmented.> This may be important for large objects randomly updated "inside", > like VM disk images and iSCSI backing stores, precreated database > table files, maybe swapfiles, etc.-- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Assistant Technical Director, LoneStarCon 3 (http://lonestarcon3.org/) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, Troy Civic Theatre Company -> Technical Advisor, RPI Players
COW goes back at least to the early days of virtual memory and fork(). On fork() the kernel would arrange for writable pages in the parent process to be made read-only so that writes to them could be caught and then the page fault handler would copy the page (and restore write access) so the parent and child each have their own private copies. COW as used in ZFS is not the same, but the concept was introduced very early also, IIRC in the mid-80s -- certainly no later than BSD4.4''s log structure filesystem (which ZFS resembles in many ways). So, is COW a misnomer? Yes and no, and anyways, it''s irrelevant. The important thing is that when you say COW people understand that you''re not saving a copy of the old thing but rather writing the new thing to a new location. (The old version of whatever was copied-on-write is stranded, unless -of course- you have references left to it from things like snapshots.) Nico --