thr3ads.net - zfs discuss - [zfs-discuss] Recordsize... [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Tristan Ball

2010-Jan-17 19:59 UTC

[zfs-discuss] Recordsize...

Hi Everyone,

Is it possible to use send/recv to change the recordsize, or does each 
file need to be individually recreated/copied within a given dataset?

Is there a way to check the recordsize of a given file, assuming that 
the filesystems recordsize was changed at some point?

Also - Am I right in thinking that if a 4K write is made to a filesystem 
block with a recordsize of 8K, then the original block is read (assuming 
it''s not in the ARC), before the new block is written elsewhere (the 
"copy", from copy on write)? This would be one of the reasons that 
aligning application IO size and filesystem record sizes is a good 
thing, because where such IO is aligned, you remove the need for that 
original read?

Thanks,
     Tristan

Bob Friesenhahn

2010-Jan-17 20:34 UTC

head link

[zfs-discuss] Recordsize...

On Mon, 18 Jan 2010, Tristan Ball wrote:
> Is there a way to check the recordsize of a given file, assuming that the 
> filesystems recordsize was changed at some point?
This would be problematic since a file may consist of different size 
records (at least I think so).  If the record size was changed after 
the file was already created, then new/updated parts would use the new 
record size.
> Also - Am I right in thinking that if a 4K write is made to a filesystem 
> block with a recordsize of 8K, then the original block is read (assuming
it''s
> not in the ARC), before the new block is written elsewhere (the
"copy", from
> copy on write)? This would be one of the reasons that aligning application
IO
> size and filesystem record sizes is a good thing, because where such IO is 
> aligned, you remove the need for that original read?
This is exactly right.  There is a very large performance hit if the 
block to be updated is no longer in the ARC and the update does not 
perfectly align to the origin and size of the underlying block. 
Applications which are aware of this (and which expect the total 
working set to be much larger than available cache) could choose to 
read and write more data than absolutely required so that zfs does not 
need to read an existing block in order to update it.  This also 
explains why the l2arc can be so valuable, if the data then fits in 
the ARC.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, simplesystems.org/users/bfriesen
GraphicsMagick Maintainer,    GraphicsMagick.org

Richard Elling

2010-Jan-18 03:22 UTC

head link

[zfs-discuss] Recordsize...

On Jan 17, 2010, at 11:59 AM, Tristan Ball wrote:
> Hi Everyone,
> 
> Is it possible to use send/recv to change the recordsize, or does each file
need to be individually recreated/copied within a given dataset?
Yes.  The former does the latter.
> Is there a way to check the recordsize of a given file, assuming that the
filesystems recordsize was changed at some point?
I don''t know of an easy way to do this. But it is also rarely needed.
For most
file system use it is best to let the recordsize scale to large values.  It is
only
for fixed record length workloads (eg. databases) that recordsize matching
can significantly improve efficiency.
> Also - Am I right in thinking that if a 4K write is made to a filesystem
block with a recordsize of 8K, then the original block is read (assuming
it''s not in the ARC), before the new block is written elsewhere (the
"copy", from copy on write)? This would be one of the reasons that
aligning application IO size and filesystem record sizes is a good thing,
because where such IO is aligned, you remove the need for that original read?
No.  Think of recordsize as a limit.  As long as the recordsize >= 4 KB, a
4KB
file will only use one, 4KB record.
 -- richard

Phil Harman

2010-Jan-18 08:42 UTC

head link

[zfs-discuss] Recordsize...

Richard Elling wrote:
> Tristan Ball wrote:
>> Also - Am I right in thinking that if a 4K write is made to a  
>> filesystem block with a recordsize of 8K, then the original block  
>> is read (assuming it''s not in the ARC), before the new block
is
>> written elsewhere (the "copy", from copy on write)? This
would be
>> one of the reasons that aligning application IO size and filesystem  
>> record sizes is a good thing, because where such IO is aligned, you  
>> remove the need for that original read?
>
> No.  Think of recordsize as a limit.  As long as the recordsize >= 4  
> KB, a 4KB file will only use one, 4KB record.
> -- richard
I didn''t read Tristan''s question as refering to a 4KB file.

If a file with an 8KB recordsize already has one or more 8KB records,  
then a single 4KB non-synchronous write to a record not already in the  
ARC will require a read as part of the copy on write operation.

Howver, I''m assuming that mutiple synchronous sequential writes to,  
say, an Oracle redo log, which are first committed to the ZIL, will  
generally coalesce before the file''s records are COW-ed, thus avoiding
reads for all but the last record (assuming it''s not aligned or
cached).

But I''m always open to having my assumptions verified :)

Phil

Fajar A. Nugraha

2010-Jan-18 09:37 UTC

head link

[zfs-discuss] Recordsize...

On Mon, Jan 18, 2010 at 10:22 AM, Richard Elling
<richard.elling at gmail.com> wrote:> On Jan 17, 2010, at 11:59 AM, Tristan Ball wrote:
>> Is there a way to check the recordsize of a given file, assuming that
the filesystems recordsize was changed at some point?
>
> I don''t know of an easy way to do this.
can''t you use zdb? something like

zdb -dddd pool_name/fs_name

then read the output of dblk. You''d have to manually look for that
file in the output though.

-- 
Fajar

Robert Milkowski

2010-Jan-18 10:22 UTC

head link

[zfs-discuss] Recordsize...

On 17/01/2010 20:34, Bob Friesenhahn wrote:> On Mon, 18 Jan 2010, Tristan Ball wrote:
>
>> Is there a way to check the recordsize of a given file, assuming that 
>> the filesystems recordsize was changed at some point?
>
> This would be problematic since a file may consist of different size 
> records (at least I think so).  If the record size was changed after 
> the file was already created, then new/updated parts would use the new 
> record size.
A single file can only have one recordsize (except for a tail block 
which might be shorter).
So if you created a large file with the default recordsize of 128K and 
you change later a filesystem recordsize to lets say 8k it will affect 
only new files being created - the file will stay at using 128K. However 
if you would copy the file then its copy would use a new recordsize of 8k.

-- 
Robert Milkowski
milek.blogspot.com

zfs discuss - Jan 2010 - Recordsize...

[zfs-discuss] Recordsize...

[zfs-discuss] Recordsize...

[zfs-discuss] Recordsize...

[zfs-discuss] Recordsize...

[zfs-discuss] Recordsize...

[zfs-discuss] Recordsize...