thr3ads.net - zfs discuss - [zfs-discuss] A question about recordsize... [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Marcelo Leal

2008-Sep-05 11:40 UTC

[zfs-discuss] A question about recordsize...

Hello!
 Assuming the default recordsize (FSB) in zfs is 128k, so:
 1 - If i have a file with 10k, the zfs  will allocate a FSD of 10k. Right? As
zfs is not static like the other filesystems, i don?t have that old internal
fragmentation...

 2 - If the above is right, i don?t need to adjust the recordsize (FSB) if i
will handle a lot of tiny files. Right?

 3 - if the two above are right ones, so the tuning of the recordsize is just
important for files greater than the FSB. Let?s say, 129k... but so, another
question: If the file is 129k, the zfs will allocate one filesystem block of
128k and another of... 1k! Right? Or two of 128k?

 4 - The last one... ;-)
      For the FSB allocation, how the zfs knows the file size, for know if the
file is smaller than the FSB? Something related to the txg? When the write goes
to the disk, the zfs knows (some way) if that write is a whole file or a piece
of it?

 Thanks a lot!

 Leal.
--
This message posted from opensolaris.org

Bob Friesenhahn

2008-Sep-05 15:36 UTC

head link

[zfs-discuss] A question about recordsize...

On Fri, 5 Sep 2008, Marcelo Leal wrote:> 4 - The last one... ;-)  For the FSB allocation, how the zfs knows 
> the file size, for know if the file is smaller than the FSB? 
> Something related to the txg? When the write goes to the disk, the 
> zfs knows (some way) if that write is a whole file or a piece of it?
For synchronous writes (file opened with O_DSYNC option), ZFS must 
write the data based on what it has been provided in the write so at 
any point in time, the quality of the result (amount of data in tail 
block) depends on application requests.  However, if the application 
continues to extend the file via synchronous writes, existing data in 
the sub-sized "tail" block will be re-written to a new location (due 
to ZFS COW) with the extra data added.  This means that the filesystem 
block size is more important for synchronous writes, and particularly 
if there is insufficient RAM to cache the already written block.

For asynchronous writes, ZFS will buffer writes in RAM for up to five 
seconds before actually writing it.  This buffering allows ZFS to make 
better informed decisions about how to write the data so that the data 
is written to full blocks as contiguously as possible.  If the 
application writes asynchronously, but then issues an fsync() call, 
then any cached data will be committed to disk at that time.

It can be seen that for asynchronous writes, the quality of the 
written data layout is somewhat dependent on how much RAM the system 
has available and how fast the data is written.  With more RAM, there 
can be more useful write caching (up to five seconds) and ZFS can make 
better decisions when it writes the data so that the data in a file 
can be written optimally, even with the pressure of multi-user writes.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Marcelo Leal

2008-Sep-08 12:51 UTC

head link

[zfs-discuss] A question about recordsize...

> On Fri, 5 Sep 2008, Marcelo Leal wrote:
> > 4 - The last one... ;-)  For the FSB allocation,
> how the zfs knows 
> > the file size, for know if the file is smaller than
> the FSB? 
> > Something related to the txg? When the write goes
> to the disk, the 
> > zfs knows (some way) if that write is a whole file
> or a piece of it?
> 
> For synchronous writes (file opened with O_DSYNC
> option), ZFS must 
> write the data based on what it has been provided in
> the write so at 
> any point in time, the quality of the result (amount
> of data in tail 
> block) depends on application requests.  However, if
> the application 
> continues to extend the file via synchronous writes,
> existing data in 
> the sub-sized "tail" block will be re-written to a
> new location (due 
> to ZFS COW) with the extra data added.  This means
> that the filesystem 
> block size is more important for synchronous writes,
> and particularly 
> if there is insufficient RAM to cache the already
> written block.
 If i understand well, the recordsize is really important for big files. Because
with small files, and small updates, we have a lot of chances to have the data
well organized on disk. I think the problem is the big files... where we have
tiny updates. In the creation?s time of the pool, the recordsize is 128k, but i
don?t know if that limit is real for, let?s say, when we are copying a DVD
image. I think the recordsize can be lager. If so, if in lager files we can have
a recordsize of... 1mb? So, what happen if we would change after that,
1k?> 
> For asynchronous writes, ZFS will buffer writes in
> RAM for up to five 
> seconds before actually writing it.  This buffering
> allows ZFS to make 
> better informed decisions about how to write the data
> so that the data 
> is written to full blocks as contiguously as
> possible.  If the 
> application writes asynchronously, but then issues an
> fsync() call, 
> then any cached data will be committed to disk at
> that time.
> 
> It can be seen that for asynchronous writes, the
> quality of the 
> written data layout is somewhat dependent on how much
> RAM the system 
> has available and how fast the data is written.  With
> more RAM, there 
> can be more useful write caching (up to five seconds)
> and ZFS can make 
> better decisions when it writes the data so that the
> data in a file 
> can be written optimally, even with the pressure of
> multi-user writes.
> 
 Agree. 
 Any other ZFS experts to answer the first questions? ;-)
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,
>    http://www.GraphicsMagick.org/
> _____________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
Thanks bfriesen!

 Leal
--
This message posted from opensolaris.org

Robert Milkowski

2008-Sep-08 15:14 UTC

head link

[zfs-discuss] A question about recordsize...

Hello Marcelo,

Monday, September 8, 2008, 1:51:09 PM, you wrote:

ML>  If i understand well, the recordsize is really important for big
ML> files. Because with small files, and small updates, we have a lot
ML> of chances to have the data well organized on disk. I think the
ML> problem is the big files... where we have tiny updates. In the
ML> creation?s time of the pool, the recordsize is 128k, but i don?t
ML> know if that limit is real for, let?s say, when we are copying a
ML> DVD image. I think the recordsize can be lager. If so, if in lager
ML> files we can have a recordsize of... 1mb? So, what happen if we would
change after that, 1k?

Maximum record size currently supported is 128KB.


-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Marcelo Leal

2008-Sep-09 11:52 UTC

head link

[zfs-discuss] A question about recordsize...

Hello milek, 
 That information remains true?

ZFS algorithm for selecting block sizes 

The initial block size is the smallest support block size larger than the first
write to the file.
Grow to the next largest block size for the entire file when the total file
length increases beyond the current block size (up to the maximum block size).
Shrink the block size when the entire file will fit in a single smaller block. 
ZFS currently support nine block sizes, from 512 bytes to 128K. Larger block
size could be supported in the future, but see roch''s blog on why 128k
is enough


 Thanks.
--
This message posted from opensolaris.org

zfs discuss - Sep 2008 - A question about recordsize...

[zfs-discuss] A question about recordsize...

[zfs-discuss] A question about recordsize...

[zfs-discuss] A question about recordsize...

[zfs-discuss] A question about recordsize...

[zfs-discuss] A question about recordsize...