thr3ads.net - zfs discuss - [zfs-discuss] zfs rewrite? [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Pawel Jakub Dawidek

2007-Jan-27 01:34 UTC

[zfs-discuss] zfs rewrite?

Hi.

What do you guys think about implementing ''zfs/zpool rewrite''
command?
It''ll read every block older than the date when the command was
executed
and write it again (using standard ZFS COW mechanism, simlar to how
resilvering works, but the data is read from the same disk it is written to).

I see few situations where it might be useful:

1. My file system is almost full (or not) and I''d like to enable
compression on it. Unfortunately compression will work from now on and
I''d also like to compress already stored data. Here comes ''zfs
rewrite''!

2. I was bad boy and turned off checksuming. Now I suspect something
corrupts my data and I''d really like to checksum everything. Ok, here
comes ''zfs rewrite''!

3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and because
file system is mostly read-only, it''ll need to byteswap all the time.
And here comes ''zfs rewrite''!

4. Not sure how ZFS traverse blocks tree, if it is done based on files,
it my be used to move data from one file closer to each other, which
will reduce seek times. Because of the way how ZFS works, the data may
become fragmented and ''zfs rewrite'' could be used for
defragmentation.

5. Once file system encryption is implemented, this mechanism can be
used to encrypt existing file system and also it can be used to change
encryption key.

What do you think?

--
Pawel Jakub Dawidek http://www.wheel.pl
pjd at FreeBSD.org http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070127/de1cd2e5/attachment.bin>

Darren Dunham

2007-Jan-27 02:08 UTC

head link

[zfs-discuss] zfs rewrite?

> What do you guys think about implementing ''zfs/zpool
rewrite'' command?
> It''ll read every block older than the date when the command was
executed
> and write it again (using standard ZFS COW mechanism, simlar to how
> resilvering works, but the data is read from the same disk it is written
to> ).
#1 How do you control I/O overhead?

#2 Snapshot blocks are never rewritten at the moment.  Most of your
   suggestions seem to imply working on the "live" data, but doing
that
   for snapshots as well might be tricky. 
> 3. I created file system with huge amount of data, where most of the
> data is read-only. I change my server from intel to sparc64 machine.
> Adaptive endianess only change byte order to native on write and because
> file system is mostly read-only, it''ll need to byteswap all the
time.
> And here comes ''zfs rewrite''!
It''s only the metadata that is modified anyway, not the file data.  I
would hope that this could be done more easily than a full tree rewrite
(and again the issue with snapshots).  Also, the overhead there probably
isn''t going to be very high (since the metadata will be cached in most
cases).  

Other than that, I''m guessing something like this will be necessary to
implement disk evacuation/removal.  If you have to rewrite data from one
disk to elsewhere in the pool, then rewriting the entire tree shouldn''t
be much harder.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Toby Thain

2007-Jan-27 02:27 UTC

head link

[zfs-discuss] zfs rewrite?

On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
> Hi.
>
> What do you guys think about implementing ''zfs/zpool
rewrite'' command?
> It''ll read every block older than the date when the command was  
> executed
> and write it again (using standard ZFS COW mechanism, simlar to how
> resilvering works, but the data is read from the same disk it is  
> written to).
>
> I see few situations where it might be useful:
>
> 1. My file system is almost full (or not) and I''d like to enable
> compression on it. Unfortunately compression will work from now on and
> I''d also like to compress already stored data. Here comes
''zfs
> rewrite''!
>
> 2. I was bad boy and turned off checksuming. Now I suspect something
> corrupts my data and I''d really like to checksum everything. Ok,
here
> comes ''zfs rewrite''!
In this case you deserve what you get.
>
> 3. I created file system with huge amount of data, where most of the
> data is read-only. I change my server from intel to sparc64 machine.
> Adaptive endianess only change byte order to native on write and  
> because
> file system is mostly read-only, it''ll need to byteswap all the
time.
> And here comes ''zfs rewrite''!
Why would this help? (Obviously file data is never ''swapped'').

--T
>
> 4. Not sure how ZFS traverse blocks tree, if it is done based on  
> files,
> it my be used to move data from one file closer to each other, which
> will reduce seek times. Because of the way how ZFS works, the data may
> become fragmented and ''zfs rewrite'' could be used for
defragmentation.
>
> 5. Once file system encryption is implemented, this mechanism can be
> used to encrypt existing file system and also it can be used to change
> encryption key.
>
> What do you think?
>
> -- 
> Pawel Jakub Dawidek                       http://www.wheel.pl
> pjd at FreeBSD.org                           http://www.FreeBSD.org
> FreeBSD committer                         Am I Evil? Yes, I Am!
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Frank Cusack

2007-Jan-27 06:57 UTC

head link

[zfs-discuss] zfs rewrite?

On January 27, 2007 12:27:17 AM -0200 Toby Thain <toby at smartgames.ca>
wrote:> On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
>> 3. I created file system with huge amount of data, where most of the
>> data is read-only. I change my server from intel to sparc64 machine.
>> Adaptive endianess only change byte order to native on write and
>> because
>> file system is mostly read-only, it''ll need to byteswap all
the time.
>> And here comes ''zfs rewrite''!
>
> Why would this help? (Obviously file data is never
''swapped'').
Metadata (incl checksums?) still has to be byte-swapped.  Or would
atime updates also force a metadata update?  Or am I totally mistaken.

-frank

Jeff Bonwick

2007-Jan-27 07:27 UTC

head link

[zfs-discuss] zfs rewrite?

On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack
wrote:> On January 27, 2007 12:27:17 AM -0200 Toby Thain <toby at
smartgames.ca> wrote:
> >On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
> >>3. I created file system with huge amount of data, where most of
the
> >>data is read-only. I change my server from intel to sparc64
machine.
> >>Adaptive endianess only change byte order to native on write and
> >>because
> >>file system is mostly read-only, it''ll need to byteswap
all the time.
> >>And here comes ''zfs rewrite''!
> >
> >Why would this help? (Obviously file data is never
''swapped'').
> 
> Metadata (incl checksums?) still has to be byte-swapped.  Or would
> atime updates also force a metadata update?  Or am I totally mistaken.
You''re all correct.  File data is never byte-swapped.  Most metadata
needs to be byte-swapped, but it''s generally only 1-2% of your space.
So the overhead shouldn''t be significant, even if you never rewrite.

An atime update will indeed cause a znode rewrite (unless you run
with "zfs set atime=off"), so znodes will get rewritten by reads.

The only other non-trivial metadata is the indirect blocks.
All files up to 128k are stored in a single block: ZFS has
variable blocksize from 512 bytes to 128k, so a 35k file consumes
exactly 35k (not, say, 40k as it would with a fixed 8k blocksize).
Single-block files have no indirect blocks, and hence no metadata
other than the znode.  So all that remains is the indirect blocks
for files larger than 128k -- which is to say, not very much.

Jeff

Toby Thain

2007-Jan-27 08:15 UTC

head link

[zfs-discuss] zfs rewrite?

On 27-Jan-07, at 4:57 AM, Frank Cusack wrote:
> On January 27, 2007 12:27:17 AM -0200 Toby Thain  
> <toby at smartgames.ca> wrote:
>> On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
>>> 3. I created file system with huge amount of data, where most of
the
>>> data is read-only. I change my server from intel to sparc64
machine.
>>> Adaptive endianess only change byte order to native on write and
>>> because
>>> file system is mostly read-only, it''ll need to byteswap
all the
>>> time.
>>> And here comes ''zfs rewrite''!
>>
>> Why would this help? (Obviously file data is never
''swapped'').
>
> Metadata (incl checksums?) still has to be byte-swapped.
I''m aware, but is this really ever going to be an issue?

--T
> Or would
> atime updates also force a metadata update?  Or am I totally mistaken.
>
> -frank

Frank Cusack

2007-Jan-27 17:43 UTC

head link

[zfs-discuss] zfs rewrite?

On January 27, 2007 6:15:29 AM -0200 Toby Thain <toby at smartgames.ca>
wrote:>
> On 27-Jan-07, at 4:57 AM, Frank Cusack wrote:
>
>> On January 27, 2007 12:27:17 AM -0200 Toby Thain
>> <toby at smartgames.ca> wrote:
>>> On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
>>>> 3. I created file system with huge amount of data, where most
of the
>>>> data is read-only. I change my server from intel to sparc64
machine.
>>>> Adaptive endianess only change byte order to native on write
and
>>>> because
>>>> file system is mostly read-only, it''ll need to
byteswap all the
>>>> time.
>>>> And here comes ''zfs rewrite''!
>>>
>>> Why would this help? (Obviously file data is never
''swapped'').
>>
>> Metadata (incl checksums?) still has to be byte-swapped.
>
> I''m aware, but is this really ever going to be an issue?
Well, it IS extra work.  But yeah, it seems pretty insignificant to me.
-frank

Pawel Jakub Dawidek

2007-Jan-28 15:59 UTC

head link

[zfs-discuss] zfs rewrite?

On Fri, Jan 26, 2007 at 06:08:50PM -0800, Darren Dunham
wrote:> > What do you guys think about implementing ''zfs/zpool
rewrite'' command?
> > It''ll read every block older than the date when the command
was executed
> > and write it again (using standard ZFS COW mechanism, simlar to how
> > resilvering works, but the data is read from the same disk it is
written to> > ).
> 
> #1 How do you control I/O overhead?
The same way it is handled for scrub and resilver.
> #2 Snapshot blocks are never rewritten at the moment.  Most of your
>    suggestions seem to imply working on the "live" data, but
doing that
>    for snapshots as well might be tricky. 
Good point, see below.
> > 3. I created file system with huge amount of data, where most of the
> > data is read-only. I change my server from intel to sparc64 machine.
> > Adaptive endianess only change byte order to native on write and
because
> > file system is mostly read-only, it''ll need to byteswap all
the time.
> > And here comes ''zfs rewrite''!
> 
> It''s only the metadata that is modified anyway, not the file data.
I
> would hope that this could be done more easily than a full tree rewrite
> (and again the issue with snapshots).  Also, the overhead there probably
> isn''t going to be very high (since the metadata will be cached in
most
> cases).  
Agreed. Probably in this case there should be rewrite-only-metadata
mode. I agree the overhead is probably not high, but on the other hand,
I''m quite sure there are workload, which will see the difference, eg.
''find / -name something''.
> Other than that, I''m guessing something like this will be
necessary to
> implement disk evacuation/removal.  If you have to rewrite data from one
> disk to elsewhere in the pool, then rewriting the entire tree
shouldn''t
> be much harder.
How did I forget about this one?:) That''s right. I belive ZFS will gain
such ability at some point and rewrite functionality fits very nice
here: mark the disk/mirror/raid-z as no-more-writes and start rewrite
process (probably only limited to this entity). To implement such
functionality there also has to be a way to migrate snapshot data, so
sooner or later there will be a need for moving snapshot blocks.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070128/7768338d/attachment.bin>

Frank Cusack

2007-Jan-28 21:42 UTC

head link

[zfs-discuss] zfs rewrite?

On January 28, 2007 4:59:48 PM +0100 Pawel Jakub Dawidek <pjd at
FreeBSD.org>
wrote:> On Fri, Jan 26, 2007 at 06:08:50PM -0800, Darren Dunham wrote:
>> > 3. I created file system with huge amount of data, where most of
the
>> > data is read-only. I change my server from intel to sparc64
machine.
>> > Adaptive endianess only change byte order to native on write and
>> > because file system is mostly read-only, it''ll need to
byteswap all
>> > the time. And here comes ''zfs rewrite''!
>>
>> It''s only the metadata that is modified anyway, not the file
data.  I
>> would hope that this could be done more easily than a full tree rewrite
>> (and again the issue with snapshots).  Also, the overhead there
probably
>> isn''t going to be very high (since the metadata will be cached
in most
>> cases).
>
> Agreed. Probably in this case there should be rewrite-only-metadata
> mode. I agree the overhead is probably not high, but on the other hand,
> I''m quite sure there are workload, which will see the difference,
eg.
> ''find / -name something''.
I''d imagine even for that it wouldn''t matter.  The I/O time
will dwarf
any time spent byte-swapping.  Easily tested though.  Make sure you
set atime=off so that your find isn''t causing write I/O.

-frank

Robert Milkowski

2007-Jan-29 00:07 UTC

head link

[zfs-discuss] zfs rewrite?

Hello Jeff,

Saturday, January 27, 2007, 8:27:09 AM, you wrote:


JB> You''re all correct.  File data is never byte-swapped.  Most
metadata
JB> needs to be byte-swapped, but it''s generally only 1-2% of your
space.
JB> So the overhead shouldn''t be significant, even if you never
rewrite.

I remember some time ago Sun touted ZFS has some interesting new
technology to deal with endianess and that patent is pending for it.
Can you share what was it about?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Matthew Ahrens

2007-Feb-15 18:42 UTC

head link

[zfs-discuss] zfs rewrite?

Pawel Jakub Dawidek wrote:
> What do you guys think about implementing ''zfs/zpool
rewrite'' command?
> It''ll read every block older than the date when the command was
executed
> and write it again (using standard ZFS COW mechanism, simlar to how
> resilvering works, but the data is read from the same disk it is written
to).
Yeah, that would be great, and in fact we are implementing such a thing 
right now (to support pool shrinkage, among other features).  The tricky 
part is dealing with block pointers that appear in multiple places (eg, 
snapshots and clones).  Having "rewrite everything" result in more
space
being used would not be acceptable.

--matt

zfs discuss - Jan 2007 - zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?

[zfs-discuss] zfs rewrite?