Hi What is the impact of not aligning the DB blocksize (16K) with ZFS, especially when it comes to random reads on single HW RAID LUN. How would one go about measuring the impact (if any) on the workload? Thank you
Louwtjie Burger wrote:> Hi > > What is the impact of not aligning the DB blocksize (16K) with ZFS, > especially when it comes to random reads on single HW RAID LUN. >Potentially, depending on the write part of the workload, the system may read 128 kBytes to get a 16 kByte block. This is not efficient and may be noticeable as a performance degradation.> How would one go about measuring the impact (if any) on the workload? >Try it and see if it meets your requirements. -- richard
Is compression impacted when setting block size? --zoly -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Richard Elling Sent: Thursday, November 08, 2007 1:56 PM To: Louwtjie Burger Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] ZFS + DB + default blocksize Louwtjie Burger wrote:> Hi > > What is the impact of not aligning the DB blocksize (16K) with ZFS, > especially when it comes to random reads on single HW RAID LUN. >Potentially, depending on the write part of the workload, the system may read 128 kBytes to get a 16 kByte block. This is not efficient and may be noticeable as a performance degradation.> How would one go about measuring the impact (if any) on the workload? >Try it and see if it meets your requirements. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 11/8/07, Richard Elling <Richard.Elling at sun.com> wrote:> Louwtjie Burger wrote: > > Hi > > > > What is the impact of not aligning the DB blocksize (16K) with ZFS, > > especially when it comes to random reads on single HW RAID LUN. > > > > Potentially, depending on the write part of the workload, the system may > read > 128 kBytes to get a 16 kByte block. This is not efficient and may be > noticeable > as a performance degradation.Hi Richard. The amount of time it takes to position the drive to get to the start of the 16K block takes longer than the time it takes to read the extra 112 KB ... depending where on the platter this is one could calculate it. Also... doesn''t ZFS do some form of read ahead .. 64KB anyways? I suspect that the reason for the blocksize allignment is not so much for 50 IOP''s ... I think it only shows its ugly head when your doing 1000''s of IOPs and the time it takes to read extra data starts adding up.
Louwtjie Burger wrote:> On 11/8/07, Richard Elling <Richard.Elling at sun.com> wrote: > >> Louwtjie Burger wrote: >> >>> Hi >>> >>> What is the impact of not aligning the DB blocksize (16K) with ZFS, >>> especially when it comes to random reads on single HW RAID LUN. >>> >>> >> Potentially, depending on the write part of the workload, the system may >> read >> 128 kBytes to get a 16 kByte block. This is not efficient and may be >> noticeable >> as a performance degradation. >> > > Hi Richard. > > The amount of time it takes to position the drive to get to the start > of the 16K block takes longer than the time it takes to read the extra > 112 KB ... depending where on the platter this is one could calculate > it. > > Also... doesn''t ZFS do some form of read ahead .. 64KB anyways? > > I suspect that the reason for the blocksize allignment is not so much > for 50 IOP''s ... I think it only shows its ugly head when your doing > 1000''s of IOPs and the time it takes to read extra data starts adding > up. >Yes, I agree. On a side note, we are starting to look much more closely at efficiency and trying to identify good ways to measure efficiency for the highly parallel systems of the present and future. The early work has been around power efficiency as the vendors berate each other as being power hungry. But we expect the efficiency improvements to continue across all aspects of computing. Hence the statement that reading extra stuff is inefficient, but not necessarily poorly performing :-) -- richard
> > Also... doesn''t ZFS do some form of read ahead .. 64KB anyways? >I believe you are referring to the vdev cache here. Check out: http://blogs.sun.com/erickustarz/entry/vdev_cache_improvements_to_help eric
Louwtjie Burger writes: > Hi > > What is the impact of not aligning the DB blocksize (16K) with ZFS, > especially when it comes to random reads on single HW RAID LUN. > > How would one go about measuring the impact (if any) on the workload? > The DB will have a bigger in memory footprint as you will need to keep the ZFS record for the lifespan of the DB block. This probably means you want to partition memory between DB cache/ZFS ARC cache according to the ratio of DB blocksize/ZFS recordize. Then I imagine you have multiple spindles associated with the lun. If you''re lun is capable of 2000 IOPS over a 200MB/sec data channel then during 1 second at full speed : 2000 IOPS * 16K = 32MB of data transfer, and this fits in the channel capability. But using say a ZFS blocks of 128K then 2000 IOPS * 128K = 256MB, which overload the channel. So in this example the data channel would saturate first preventing you from reaching those 2000 IOPS. But with enough memory and data channel throughput then it''s a good idea to keep the ZFS recordize large. -r > Thank you > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Yes. Blocks are compressed individually, so a smaller block size will (on average) lead to less compression. (Assuming that your data is compressible at all, that is.) This message posted from opensolaris.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Louwtjie Burger wrote:> On 11/8/07, Richard Elling <Richard.Elling at sun.com> wrote: >> Potentially, depending on the write part of the workload, the system may >> read >> 128 kBytes to get a 16 kByte block. This is not efficient and may be >> noticeable >> as a performance degradation. > > Hi Richard. > > The amount of time it takes to position the drive to get to the start > of the 16K block takes longer than the time it takes to read the extra > 112 KB ... depending where on the platter this is one could calculate > it.Worse yet, if your zfs blocksize is 128KB and your database worksize is 16Kbytes, ZFS would load 128Kbytes, update 16 kbytes inside there and write out 128 kbytes to the disk. If both blocksizes are equal, you don''t need the read part. That is a huge win. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at argo.es http://www.argo.es/~jcea/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRztjCplgi5GaxT1NAQIxHAP/VH142N+TAfFpZweli6FofC2r0lreB9zx yvhqZa6i4UHpMKHHODIlLL76iMc10rtT0o0of/Tlm3Ohz/ZDjZ4Emh13zLx4+EBk JizrFKSBfnEa3KVJ4j2rTRRDsqCelw9YTmfUnd+eUk3hw2GNwpocVDK3QVkS1xWM vuUdxUAdnZc=UlDy -----END PGP SIGNATURE-----