zfs discuss - Nov 2007 - ZFS + DB + default blocksize

If this information is useful, please help other people find it:
Share via:

Louwtjie Burger

2007-Nov-08 07:56 UTC

[zfs-discuss] ZFS + DB + default blocksize

Hi

What is the impact of not aligning the DB blocksize (16K) with ZFS,
especially when it comes to random reads on single HW RAID LUN.

How would one go about measuring the impact (if any) on the workload?

Thank you

Richard Elling

2007-Nov-08 18:56 UTC

[zfs-discuss] ZFS + DB + default blocksize

Louwtjie Burger wrote:> Hi
>
> What is the impact of not aligning the DB blocksize (16K) with ZFS,
> especially when it comes to random reads on single HW RAID LUN.
>   
Potentially, depending on the write part of the workload, the system may 
read
128 kBytes to get a 16 kByte block.  This is not efficient and may be 
noticeable
as a performance degradation.
> How would one go about measuring the impact (if any) on the workload?
>   
Try it and see if it meets your requirements.
 -- richard

Zoltan Farkas

2007-Nov-08 20:43 UTC

[zfs-discuss] ZFS + DB + default blocksize

Is compression impacted when setting block size?

--zoly

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at
opensolaris.org] On Behalf Of Richard Elling
Sent: Thursday, November 08, 2007 1:56 PM
To: Louwtjie Burger
Cc: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] ZFS + DB + default blocksize

Louwtjie Burger wrote:> Hi
>
> What is the impact of not aligning the DB blocksize (16K) with ZFS,
> especially when it comes to random reads on single HW RAID LUN.
>
Potentially, depending on the write part of the workload, the system may
read
128 kBytes to get a 16 kByte block.  This is not efficient and may be
noticeable
as a performance degradation.
> How would one go about measuring the impact (if any) on the workload?
>
Try it and see if it meets your requirements.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
mail.opensolaris.org/mailman/listinfo/zfs-discuss

Louwtjie Burger

2007-Nov-08 21:40 UTC

[zfs-discuss] ZFS + DB + default blocksize

On 11/8/07, Richard Elling <Richard.Elling at sun.com>
wrote:> Louwtjie Burger wrote:
> > Hi
> >
> > What is the impact of not aligning the DB blocksize (16K) with ZFS,
> > especially when it comes to random reads on single HW RAID LUN.
> >
>
> Potentially, depending on the write part of the workload, the system may
> read
> 128 kBytes to get a 16 kByte block.  This is not efficient and may be
> noticeable
> as a performance degradation.
Hi Richard.

The amount of time it takes to position the drive to get to the start
of the 16K block takes longer than the time it takes to read the extra
112 KB ... depending where on the platter this is one could calculate
it.

Also... doesn''t ZFS do some form of read ahead .. 64KB anyways?

I suspect that the reason for the blocksize allignment is not so much
for 50 IOP''s ... I think it only shows its ugly head when your doing
1000''s of IOPs and the time it takes to read extra data starts adding
up.

Richard Elling

2007-Nov-08 21:52 UTC

[zfs-discuss] ZFS + DB + default blocksize

Louwtjie Burger wrote:> On 11/8/07, Richard Elling <Richard.Elling at sun.com> wrote:
>   
>> Louwtjie Burger wrote:
>>     
>>> Hi
>>>
>>> What is the impact of not aligning the DB blocksize (16K) with ZFS,
>>> especially when it comes to random reads on single HW RAID LUN.
>>>
>>>       
>> Potentially, depending on the write part of the workload, the system
may
>> read
>> 128 kBytes to get a 16 kByte block.  This is not efficient and may be
>> noticeable
>> as a performance degradation.
>>     
>
> Hi Richard.
>
> The amount of time it takes to position the drive to get to the start
> of the 16K block takes longer than the time it takes to read the extra
> 112 KB ... depending where on the platter this is one could calculate
> it.
>
> Also... doesn''t ZFS do some form of read ahead .. 64KB anyways?
>
> I suspect that the reason for the blocksize allignment is not so much
> for 50 IOP''s ... I think it only shows its ugly head when your
doing
> 1000''s of IOPs and the time it takes to read extra data starts
adding
> up.
>   
Yes, I agree.

On a side note, we are starting to look much more closely
at efficiency and trying to identify good ways to measure
efficiency for the highly parallel systems of the present
and future.  The early work has been around power efficiency
as the vendors berate each other as being power hungry.
But we expect the efficiency improvements to continue
across all aspects of computing.  Hence the statement that
reading extra stuff is inefficient, but not necessarily poorly
performing :-)
 -- richard

eric kustarz

2007-Nov-08 23:07 UTC

[zfs-discuss] ZFS + DB + default blocksize

>
> Also... doesn''t ZFS do some form of read ahead .. 64KB anyways?
>
I believe you are referring to the vdev cache here.  Check out:
blogs.sun.com/erickustarz/entry/vdev_cache_improvements_to_help

eric

Roch - PAE

2007-Nov-12 11:14 UTC

[zfs-discuss] ZFS + DB + default blocksize

Louwtjie Burger writes:
 > Hi
 > 
 > What is the impact of not aligning the DB blocksize (16K) with ZFS,
 > especially when it comes to random reads on single HW RAID LUN.
 > 
 > How would one go about measuring the impact (if any) on the workload?
 > 

The DB will have a bigger in memory footprint as you
will need to keep the ZFS record for the lifespan of the DB
block.

This probably means you want to partition memory between 
DB cache/ZFS ARC cache according to the ratio of DB blocksize/ZFS recordize.

Then I imagine you have multiple spindles associated with
the lun. If you''re lun is capable of 2000 IOPS over a
200MB/sec data channel then during 1 second at full speed :

	2000 IOPS * 16K = 32MB of data transfer,

and this fits  in the channel capability.
But using say a ZFS blocks of 128K then

	2000 IOPS * 128K = 256MB,

which  overload the  channel. So  in this  example the  data
channel would  saturate  first preventing you  from reaching
those 2000 IOPS.   But with enough  memory  and data channel
throughput then it''s a  good idea to  keep the ZFS recordize
large.

-r

 > Thank you
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anton B. Rang

2007-Nov-13 04:51 UTC

[zfs-discuss] ZFS + DB + default blocksize

Yes.  Blocks are compressed individually, so a smaller block size will (on
average) lead to less compression.  (Assuming that your data is compressible at
all, that is.)
 
 
This message posted from opensolaris.org

Jesus Cea

2007-Nov-14 21:05 UTC

[zfs-discuss] ZFS + DB + default blocksize

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Louwtjie Burger wrote:> On 11/8/07, Richard Elling <Richard.Elling at sun.com> wrote:
>> Potentially, depending on the write part of the workload, the system
may
>> read
>> 128 kBytes to get a 16 kByte block.  This is not efficient and may be
>> noticeable
>> as a performance degradation.
> 
> Hi Richard.
> 
> The amount of time it takes to position the drive to get to the start
> of the 16K block takes longer than the time it takes to read the extra
> 112 KB ... depending where on the platter this is one could calculate
> it.
Worse yet, if your zfs blocksize is 128KB and your database worksize is
16Kbytes, ZFS would load 128Kbytes, update 16 kbytes inside there and
write out 128 kbytes to the disk.

If both blocksizes are equal, you don''t need the read part. That is a
huge win.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at argo.es argo.es/~jcea _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - enigmail.mozdev.org

iQCVAwUBRztjCplgi5GaxT1NAQIxHAP/VH142N+TAfFpZweli6FofC2r0lreB9zx
yvhqZa6i4UHpMKHHODIlLL76iMc10rtT0o0of/Tlm3Ohz/ZDjZ4Emh13zLx4+EBk
JizrFKSBfnEa3KVJ4j2rTRRDsqCelw9YTmfUnd+eUk3hw2GNwpocVDK3QVkS1xWM
vuUdxUAdnZc=UlDy
-----END PGP SIGNATURE-----