While I understand that the filesystems are really separate, is there any reason why a "mv" from one fs to another in the same pool requires all data to be copied? Is there any good reason to deny in-pool, cross fs hardlinks? (I can understand how move cannot see that this is possible) Casper
On Thu, Dec 01, 2005 at 06:41:36PM +0100, Casper.Dik at sun.com wrote:> > > While I understand that the filesystems are really separate, > is there any reason why a "mv" from one fs to another in the same > pool requires all data to be copied? > > Is there any good reason to deny in-pool, cross fs hardlinks? > (I can understand how move cannot see that this is possible)The biggest issue with these is accounting for the storage. I.e., with filesystems /foo1 and /foo2: mkfile 100m /foo1/tmp ln /foo1/tmp /foo2/tmp How much space is used in /foo1? /foo2? Now do: rm /foo1/tmp Now how much space is used? The other issue is standards compliance: The rename() function will fail if: ... EXDEV The links named by old and new are on dif- ferent file systems. and: The link() function will fail if: ... EXDEV The link named by new and the file named by existing are on different logical devices (file systems). Cheers, - jonathan -- Jonathan Adams, Solaris Kernel Development
On Thu, Dec 01, 2005 at 06:41:36PM +0100, Casper.Dik at sun.com wrote:> While I understand that the filesystems are really separate, > is there any reason why a "mv" from one fs to another in the same > pool requires all data to be copied?This is something we plan on attempting. As Jonathan points out, you''d have to state (by setting a property or something) that you don''t want strict POSIX compliance. But as you''ve guessed, this would be a really useful feature.> Is there any good reason to deny in-pool, cross fs hardlinks? > (I can understand how move cannot see that this is possible)Cross filesystem hardlinks are a different story. If you can come up with an answer to how snapshots and accounting should behave, we''d love to be able to implement this. So far, though, we haven''t been able to figure out a good set of semantics. This is Bill''s second law of ZFS: Snapshots ruin almost every good idea I''ve ever had. :) But they''re so damned useful. --Bill
> While I understand that the filesystems are really separate, > is there any reason why a "mv" from one fs to another in the same > pool requires all data to be copied?Yes. Each filesystem has its own object number space. Suppose that the file I want to link to is object 37 is filesystem A. If I try to create a hard link to it from filesystem B, what I''ll actually end up with is a link to object 37 *in filesystem B*. This is pretty fundamental to the way directories work, both locally and over NFS. We could make cross-filesystem hard links work if all filesystems in a pool shared the same object number space, but that would create a whole different set of problems. Jeff
> > While I understand that the filesystems are really separate, > > is there any reason why a "mv" from one fs to another in the same > > pool requires all data to be copied?In addition to what Jeff said there is the more obvious (to me anyway) issue that the new filesystem may have different compression and checksum algorithms set and in the (hopefully not to distant) future different crypto algorithms and keys. -- Darren J Moffat
On 12/1/05, Jonathan Adams <jonathan.adams at sun.com> wrote:> On Thu, Dec 01, 2005 at 06:41:36PM +0100, Casper.Dik at sun.com wrote: > > > > > > While I understand that the filesystems are really separate, > > is there any reason why a "mv" from one fs to another in the same > > pool requires all data to be copied? > >the one reason I come up for this is, if I forget to enable compression on a filesystem, how else am I going to compress all my data. Though I could imagine a daemon that could be enabled that could search for data that could be compressed and possibly make recomendations that compression be enabled or perhaps just compress data on the filesystem if it results in improved performance or at least no effect on performance. James Dickens uadmin.blogspot.com> > Is there any good reason to deny in-pool, cross fs hardlinks? > > (I can understand how move cannot see that this is possible) > > The biggest issue with these is accounting for the storage. > > I.e., with filesystems /foo1 and /foo2: > > mkfile 100m /foo1/tmp > ln /foo1/tmp /foo2/tmp > > How much space is used in /foo1? /foo2? Now do: > > rm /foo1/tmp > > Now how much space is used? > > > The other issue is standards compliance: > > The rename() function will fail if: > ... > EXDEV The links named by old and new are on dif- > ferent file systems. > > and: > The link() function will fail if: > ... > EXDEV The link named by new and the file named by > existing are on different logical devices > (file systems). > > Cheers, > - jonathan > > -- > Jonathan Adams, Solaris Kernel Development > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Fri, Dec 02, 2005 at 03:23:53PM -0600, James Dickens wrote:> On 12/1/05, Jonathan Adams <jonathan.adams at sun.com> wrote: > > On Thu, Dec 01, 2005 at 06:41:36PM +0100, Casper.Dik at sun.com wrote: > > > > > > > > > While I understand that the filesystems are really separate, > > > is there any reason why a "mv" from one fs to another in the same > > > pool requires all data to be copied? > > > > the one reason I come up for this is, if I forget to enable > compression on a filesystem, how else am I going to compress all my > data. Though I could imagine a daemon that could be enabled that could > search for data that could be compressed and possibly make > recomendations that compression be enabled or perhaps just compress > data on the filesystem if it results in improved performance or at > least no effect on performance.You can just re-copy all of the data after enabling compression (it''s fairly easy to write a script, or just do something like: find . -xdev -type f | cpio -ocB | cpio -idmuv to re-write all of the data. The ZFS folks have talked about having additional "scrub-like" tasks you could run on a pool or filesystem bases; a "recompress" one would be a possibility. The main problem is snapshots; if any exist, then you''ll be paying for both the uncompressed and compressed versions. Cheers, - jonathan> > > Is there any good reason to deny in-pool, cross fs hardlinks? > > > (I can understand how move cannot see that this is possible) > > > > The biggest issue with these is accounting for the storage. > > > > I.e., with filesystems /foo1 and /foo2: > > > > mkfile 100m /foo1/tmp > > ln /foo1/tmp /foo2/tmp > > > > How much space is used in /foo1? /foo2? Now do: > > > > rm /foo1/tmp > > > > Now how much space is used? > > > > > > The other issue is standards compliance: > > > > The rename() function will fail if: > > ... > > EXDEV The links named by old and new are on dif- > > ferent file systems. > > > > and: > > The link() function will fail if: > > ... > > EXDEV The link named by new and the file named by > > existing are on different logical devices > > (file systems). > > > > Cheers, > > - jonathan > > > > -- > > Jonathan Adams, Solaris Kernel Development > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- Jonathan Adams, Solaris Kernel Development
On Fri, 2005-12-02 at 21:37, Jonathan Adams wrote:> The ZFS folks have talked about having additional "scrub-like" tasks you > could run on a pool or filesystem bases; a "recompress" one would be a > possibility. The main problem is snapshots; if any exist, then you''ll > be paying for both the uncompressed and compressed versions.The other area where something like this maybe needed is for cryptographic key change role over - though that is much more complex and we will certainly have issues with snapshots and clones to deal with in this area. -- Darren J Moffat
Jonathan Adams <jonathan.adams at sun.com> wrote:> You can just re-copy all of the data after enabling compression (it''s fairly > easy to write a script, or just do something like: > > find . -xdev -type f | cpio -ocB | cpio -idmuv > > to re-write all of the data..... and to destroy the content of all files > 5k. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Mon, Dec 05, 2005 at 01:06:37PM +0100, Joerg Schilling wrote:> Jonathan Adams <jonathan.adams at sun.com> wrote: > > > You can just re-copy all of the data after enabling compression (it''s fairly > > easy to write a script, or just do something like: > > > > find . -xdev -type f | cpio -ocB | cpio -idmuv > > > > to re-write all of the data. > > .... and to destroy the content of all files > 5k.Did you try the command? CPIO writes to a temporary file and then renames. Cheers, - jonathan -- Jonathan Adams, Solaris Kernel Development
Jonathan Adams <jonathan.adams at sun.com> wrote:> On Mon, Dec 05, 2005 at 01:06:37PM +0100, Joerg Schilling wrote: > > Jonathan Adams <jonathan.adams at sun.com> wrote: > > > > > You can just re-copy all of the data after enabling compression (it''s fairly > > > easy to write a script, or just do something like: > > > > > > find . -xdev -type f | cpio -ocB | cpio -idmuv > > > > > > to re-write all of the data. > > > > .... and to destroy the content of all files > 5k. > > Did you try the command? CPIO writes to a temporary file and then renames.OK, I in theory know this, but why is this missing from the documentation? And BTW: this is the only reason why using an outdated command like cpio for BFU makes sense at all. Also, the documentation for -u is incorrect. It does not mention, that cpio always extracts dirs, even when they are not newer in the archive. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
>>>>> "JS" == Joerg Schilling <schilling at fokus.fraunhofer.de> writes:JS> OK, I in theory know this, but why is this missing from the JS> documentation? Because it''s an implementation detail? I''m assuming that the cpio | cpio thing Jonathan mentioned isn''t explicitly mentioned in the ZFS documentation either. Matt -- Matt Simmons - simmonmt at eng.sun.com | Solaris Kernel - New York Is it true that cannibals don''t eat clowns because they taste funny?
Matthew Simmons <simmonmt at eng.sun.com> wrote:> >>>>> "JS" == Joerg Schilling <schilling at fokus.fraunhofer.de> writes: > > JS> OK, I in theory know this, but why is this missing from the > JS> documentation? > > Because it''s an implementation detail? I''m assuming that the cpio | cpio thing > Jonathan mentioned isn''t explicitly mentioned in the ZFS documentation either.Do you believe that an "implementation detail" that may break hard links should be omitted from the man page? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
>>>>> "JS" == Joerg Schilling <schilling at fokus.fraunhofer.de> writes:JS> Do you believe that an "implementation detail" that may break hard JS> links should be omitted from the man page? If it breaks hard links, then perhaps the detail should be changed/fixed. Something that, by the way, we couldn''t do had we described the detail in the manpages, thus setting it in stone for all eternity. Matt -- Matt Simmons - simmonmt at eng.sun.com | Solaris Kernel - New York I do not participate in any sport with ambulances at the bottom of a hill. --Erma Bombeck
On Tue 06 Dec 2005 at 01:29PM, Joerg Schilling wrote:> > Also, the documentation for -u is incorrect. It does not mention, that > cpio always extracts dirs, even when they are not newer in the archive.Please file a documentation bug at http://bugs.opensolaris.org. -dp -- Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp
Matthew Simmons <simmonmt at eng.sun.com> wrote:> >>>>> "JS" == Joerg Schilling <schilling at fokus.fraunhofer.de> writes: > > JS> Do you believe that an "implementation detail" that may break hard > JS> links should be omitted from the man page? > > If it breaks hard links, then perhaps the detail should be changed/fixed. > > Something that, by the way, we couldn''t do had we described the detail in the > manpages, thus setting it in stone for all eternity.Dues this mean that the behavior was changed to the current one to allow BFU? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
>>>>> "JS" == Joerg Schilling <schilling at fokus.fraunhofer.de> writes:JS> Dues this mean that the behavior was changed to the current one to JS> allow BFU? What? I was saying that, if the current behavior is broken, we should fix it. I was also pointing out that, by not documenting the implementation detail in question, we actually have the ability to fix it. Had we documented the detail, life would be more difficult. Matt -- Matt Simmons - simmonmt at eng.sun.com | Solaris Kernel - New York If a parsley farmer is sued, can they garnish his wages?
Matthew Simmons <simmonmt at eng.sun.com> wrote:> I was saying that, if the current behavior is broken, we should fix it. I was > also pointing out that, by not documenting the implementation detail in > question, we actually have the ability to fix it. Had we documented the > detail, life would be more difficult.OK, so if cpio does not write _into_ existing files and if the archive does not contain _all_ hard links for a file, the extraction will break hard links. This is why star will never do something like this by default. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
> > You can just re-copy all of the data after enabling compression (it''s fairly > > easy to write a script, or just do something like: > > > > find . -xdev -type f | cpio -ocB | cpio -idmuv > > > > to re-write all of the data. > > .... and to destroy the content of all files > 5k.i tried the above for fun on /, got tons of "File .... was modified while being copied" and now ended up reinstalling my system from scratch :D (not a problem, it was just a nexenta test installation) is there a reliable method of re-compressing a whole zfs volume after turning on compression or changing compression scheme ? roland This message posted from opensolaris.org
roland wrote:> is there a reliable method of re-compressing a whole zfs volume after turning on compression or changing compression scheme ?It would be slow, and the file system would need to be idle to avoid race conditions, and it wouldn''t be very fast, but you _could_ do the following (POSIX shell syntax). I haven''t tested this, so it could have typos or other problems: find . -type f -print | while read n; do TF="$(mktemp ${n%/*}/.tmpXXXXXX)" if cp -p "$n" "$TF"; then if ! mv "$TF" "$n"; then echo "failed to re-write $n in mv" rm "$TF" fi else echo "failed to re-write $n in cp" rm "$TF" fi done -- Carson
I have had some success using zfs send recv into a child of a compressed filesystem to do this although you have the disadvantage of losing your settings. basically : zfs create tank/foo mv a bunch of files into foo zfs create tank/bar zfs set compression=on bar zfs snapshot tank/foo at now zfs send tank/foo at now | zfs recv tank/bar/foosmall zfs destroy tank/foo zfs set compression=on tank/bar/foosmall zfs rename tank/bar/foosmall tank/foo kinda clunky and you have to have twice as much space available and there are probably other issues with it as I am not a pro zfs user here but, worked for me =) Asa On Jul 2, 2007, at 5:32 AM, Carson Gaspar wrote:> roland wrote: > >> is there a reliable method of re-compressing a whole zfs volume >> after turning on compression or changing compression scheme ? > > It would be slow, and the file system would need to be idle to avoid > race conditions, and it wouldn''t be very fast, but you _could_ do the > following (POSIX shell syntax). I haven''t tested this, so it could > have > typos or other problems: > > find . -type f -print | while read n; do > TF="$(mktemp ${n%/*}/.tmpXXXXXX)" > if cp -p "$n" "$TF"; then > if ! mv "$TF" "$n"; then > echo "failed to re-write $n in mv" > rm "$TF" > fi > else > echo "failed to re-write $n in cp" > rm "$TF" > fi > done > > -- > Carson > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> roland wrote: > >> is there a reliable method of re-compressing a whole zfs volume >> after turning on compression or changing compression scheme ? >I have had some success using zfs send recv into a child of a compressed filesystem to do this although you have the disadvantage of losing your settings. basically : zfs create tank/foo mv a bunch of files into foo zfs create tank/bar zfs set compression=on bar zfs snapshot tank/foo at now zfs send tank/foo at now | zfs recv tank/bar/foosmall zfs destroy tank/foo zfs set compression=on tank/bar/foosmall zfs rename tank/bar/foosmall tank/foo kinda clunky and you have to have twice as much space available and there are probably other issues with it as I am not a pro zfs user here but, worked for me =) Asa