Peter Eriksson
2007-Oct-10 12:59 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
At the lunch today we started talking about a feature that would have been nice to have - a system call sort of similar to link(2) where you would get a cloned copy of a file that would (initially) share the same data blocks on the disk but would use copy-on-write to create private copies as soon as something is modified. It could be nice to use for example for a mail server using maildir:s so that the mail delivery program sending a mail to multiple users could use that syscall instead of writing N copies of the same mail... (It would save space until the users would start to modifiy the files). Anyway, just a brainstorm idea that came up... :-) -- This messages posted from opensolaris.org
Robert Milkowski
2007-Oct-11 07:52 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Hello Peter, Wednesday, October 10, 2007, 1:59:20 PM, you wrote: PE> At the lunch today we started talking about a feature that would PE> have been nice to have - a system call sort of similar to link(2) PE> where you would get a cloned copy of a file that would (initially) PE> share the same data blocks on the disk but would use copy-on-write PE> to create private copies as soon as something is modified. PE> It could be nice to use for example for a mail server using PE> maildir:s so that the mail delivery program sending a mail to PE> multiple users could use that syscall instead of writing N copies of the same mail... PE> (It would save space until the users would start to modifiy the files). While I totally second the idea (it has been discussed on zfs-discuss some time ago) in case of email platform it wouldn''t be that easy as every file is different (different headers at least). But there are definitely other uses where it would be useful and improve user experience. I haven''t looked into details but in theory one should be able to copy/move a file within the same datapool between datasets without having to actually copy data blocks... or maybe there''s some detail which actually makes it hard to implement... -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Matthew Ahrens
2007-Oct-11 08:10 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Robert Milkowski wrote:> I haven''t looked into details but in theory one should be > able to copy/move a file within the same datapool between datasets > without having to actually copy data blocks... or maybe there''s some > detail which actually makes it hard to implement...Once a block is referenced by multiple filesystem, it is nontrivial to determine when it can be freed. --matt
Robert Milkowski
2007-Oct-11 09:47 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Hello Matthew, Thursday, October 11, 2007, 9:10:13 AM, you wrote: MA> Robert Milkowski wrote:>> I haven''t looked into details but in theory one should be >> able to copy/move a file within the same datapool between datasets >> without having to actually copy data blocks... or maybe there''s some >> detail which actually makes it hard to implement...MA> Once a block is referenced by multiple filesystem, it is nontrivial to MA> determine when it can be freed. In a way multiple snapshots are separate file systems, or clones... What''s the difference? However I''m sure you right... -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Pawel Jakub Dawidek
2007-Oct-11 10:27 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
On Thu, Oct 11, 2007 at 10:47:44AM +0100, Robert Milkowski wrote:> Hello Matthew, > > Thursday, October 11, 2007, 9:10:13 AM, you wrote: > > MA> Robert Milkowski wrote: > >> I haven''t looked into details but in theory one should be > >> able to copy/move a file within the same datapool between datasets > >> without having to actually copy data blocks... or maybe there''s some > >> detail which actually makes it hard to implement... > > MA> Once a block is referenced by multiple filesystem, it is nontrivial to > MA> determine when it can be freed. > > In a way multiple snapshots are separate file systems, or clones... > What''s the difference? However I''m sure you right...Snapshot and clones are not autonomous datasets. A clone has always a parent, you can use ''zfs promote'' to switch the relatioship, but you cannot make them independent, AFAIK. To Matthew: As I understand it, Robert was talking more about moving the blocks to another dataset, not creating a hardlink-like situation - only one dataset will reference the blocks after the move. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071011/57f54236/attachment.bin>
Robert Milkowski
2007-Oct-11 13:39 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Hello Pawel, Thursday, October 11, 2007, 11:27:07 AM, you wrote: PJD> On Thu, Oct 11, 2007 at 10:47:44AM +0100, Robert Milkowski wrote:>> Hello Matthew, >> >> Thursday, October 11, 2007, 9:10:13 AM, you wrote: >> >> MA> Robert Milkowski wrote: >> >> I haven''t looked into details but in theory one should be >> >> able to copy/move a file within the same datapool between datasets >> >> without having to actually copy data blocks... or maybe there''s some >> >> detail which actually makes it hard to implement... >> >> MA> Once a block is referenced by multiple filesystem, it is nontrivial to >> MA> determine when it can be freed. >> >> In a way multiple snapshots are separate file systems, or clones... >> What''s the difference? However I''m sure you right...PJD> Snapshot and clones are not autonomous datasets. A clone has always a PJD> parent, you can use ''zfs promote'' to switch the relatioship, but you PJD> cannot make them independent, AFAIK. PJD> To Matthew: As I understand it, Robert was talking more about moving the PJD> blocks to another dataset, not creating a hardlink-like situation - only PJD> one dataset will reference the blocks after the move. Yep, with move that''s what I had in mind. I''ve also was talking about zfscopy... -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Matthew Ahrens
2007-Oct-11 16:49 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Pawel Jakub Dawidek wrote:> On Thu, Oct 11, 2007 at 10:47:44AM +0100, Robert Milkowski wrote: >> Hello Matthew, >> >> Thursday, October 11, 2007, 9:10:13 AM, you wrote: >> >> MA> Robert Milkowski wrote: >>>> I haven''t looked into details but in theory one should be >>>> able to copy/move a file within the same datapool between datasets >>>> without having to actually copy data blocks... or maybe there''s some >>>> detail which actually makes it hard to implement... >> MA> Once a block is referenced by multiple filesystem, it is nontrivial to >> MA> determine when it can be freed. >> >> In a way multiple snapshots are separate file systems, or clones... >> What''s the difference? However I''m sure you right...Well, snapshots are nontrivial too. See http://blogs.sun.com/ahrens/entry/is_it_magic> Snapshot and clones are not autonomous datasets. A clone has always a > parent, you can use ''zfs promote'' to switch the relatioship, but you > cannot make them independent, AFAIK. > > To Matthew: As I understand it, Robert was talking more about moving the > blocks to another dataset, not creating a hardlink-like situation - only > one dataset will reference the blocks after the move.Well, he said "copy/move". "copy" implied to me that both filesystems would reference the same blocks. And even if it is just "move", you still have the issue of snapshots from the original filesystem referencing it. Changing the snapshots so they no longer reference the file? Also nontrivial. --matt
Pawel Jakub Dawidek
2007-Oct-11 18:46 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
On Thu, Oct 11, 2007 at 09:49:51AM -0700, Matthew Ahrens wrote:> Pawel Jakub Dawidek wrote: > > On Thu, Oct 11, 2007 at 10:47:44AM +0100, Robert Milkowski wrote: > >> Hello Matthew, > >> > >> Thursday, October 11, 2007, 9:10:13 AM, you wrote: > >> > >> MA> Robert Milkowski wrote: > >>>> I haven''t looked into details but in theory one should be > >>>> able to copy/move a file within the same datapool between datasets > >>>> without having to actually copy data blocks... or maybe there''s some > >>>> detail which actually makes it hard to implement... > >> MA> Once a block is referenced by multiple filesystem, it is nontrivial to > >> MA> determine when it can be freed. > >> > >> In a way multiple snapshots are separate file systems, or clones... > >> What''s the difference? However I''m sure you right... > > Well, snapshots are nontrivial too. See > > http://blogs.sun.com/ahrens/entry/is_it_magic > > > Snapshot and clones are not autonomous datasets. A clone has always a > > parent, you can use ''zfs promote'' to switch the relatioship, but you > > cannot make them independent, AFAIK. > > > > To Matthew: As I understand it, Robert was talking more about moving the > > blocks to another dataset, not creating a hardlink-like situation - only > > one dataset will reference the blocks after the move. > > Well, he said "copy/move". "copy" implied to me that both filesystems would > reference the same blocks. And even if it is just "move", you still have the > issue of snapshots from the original filesystem referencing it. Changing the > snapshots so they no longer reference the file? Also nontrivial.I''m sorry for trying to be too helpful:) I understand it''s not trivial, but beeing able to reference the same block from different datasets would be a really nice feature to have. The functionality discussed above if only one example. Another example would be block aggregation (which has its own name I can''t recall right now), so we can run a thread once a day that frees duplicated blocks and make datasets to point at one copy only. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071011/b14cca9b/attachment.bin>
Darren J Moffat
2007-Oct-12 10:10 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Pawel Jakub Dawidek wrote:> I understand it''s not trivial, but beeing able to reference the same > block from different datasets would be a really nice feature to have. > The functionality discussed above if only one example. Another example > would be block aggregation (which has its own name I can''t recall right > now), so we can run a thread once a day that frees duplicated blocks and > make datasets to point at one copy only.NTFS can do this within a single filesystem. It feels to me almost like the opposite of ditto blocks :-) Though I would still want ditto blocks to work with this. -- Darren J Moffat
Pawel Jakub Dawidek
2007-Oct-12 11:09 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
On Fri, Oct 12, 2007 at 11:10:31AM +0100, Darren J Moffat wrote:> Pawel Jakub Dawidek wrote: > > I understand it''s not trivial, but beeing able to reference the same > > block from different datasets would be a really nice feature to have. > > The functionality discussed above if only one example. Another example > > would be block aggregation (which has its own name I can''t recall right > > now), so we can run a thread once a day that frees duplicated blocks and > > make datasets to point at one copy only. > > NTFS can do this within a single filesystem. > > It feels to me almost like the opposite of ditto blocks :-) Though I > would still want ditto blocks to work with this.My mine use will be for things like Solaris zones or FreeBSD jails. It is nice to have one base file system, which you can just clone, but over the time the clones are getting bigger and bigger. If you sell virtual web servers for example and all you costumers upgrade apache to a new version you end up with X copies of the same blocks and you lose everything you saved by using clones initially. Beeing able to run a process in the background every night, which will aggregate the blocks back would be really nice. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20071012/93fb6ee3/attachment.bin>
Matthew Ahrens
2007-Oct-12 18:07 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Pawel Jakub Dawidek wrote:> On Fri, Oct 12, 2007 at 11:10:31AM +0100, Darren J Moffat wrote: >> Pawel Jakub Dawidek wrote: >>> I understand it''s not trivial, but beeing able to reference the same >>> block from different datasets would be a really nice feature to have. >>> The functionality discussed above if only one example. Another example >>> would be block aggregation (which has its own name I can''t recall right >>> now), so we can run a thread once a day that frees duplicated blocks and >>> make datasets to point at one copy only. >> NTFS can do this within a single filesystem. >> >> It feels to me almost like the opposite of ditto blocks :-) Though I >> would still want ditto blocks to work with this. > > My mine use will be for things like Solaris zones or FreeBSD jails. It > is nice to have one base file system, which you can just clone, but over > the time the clones are getting bigger and bigger. If you sell virtual > web servers for example and all you costumers upgrade apache to a new > version you end up with X copies of the same blocks and you lose > everything you saved by using clones initially. Beeing able to run a > process in the background every night, which will aggregate the blocks > back would be really nice.Yeah, that would be nice. De-duplication is on the list of problems we''d like to attack sooner rather than later. --matt
Torsten "Paul" Eichstädt
2007-Oct-25 16:37 UTC
[zfs-code] System call to create a clone of a file on a ZFS filesystem?
Quick shot: What''s wrong with maintaining a reference count? Free the block only when ref# is zero. Count the bytes in each fs it''s in, but only once in (each of, recursively) the parent(s) -- maybe this is costly, ''cause now the parents size is not the sum of it''s children. I don''t know about the mathematical characteristics of the ZFS checksum, but I assume it''s good to detect bit-errors and might not be good enough to find data suitable for aggregation. Paul -- This messages posted from opensolaris.org