Hi All, Is it not poosible to increase zfs record size beyond 128k.I am using Solaris 10 Update 4. I get following error when I try to set zfs record size to 1024 k. zfs set recordsize=1024k md9/test cannot set property for ''md9/test'': ''recordsize'' must be power of 2 from 512 to 128k Thanks Manoj Nayak
Why do you want greater than 128K records. Do Check out : http://blogs.sun.com/roch/entry/128k_suffice -r Manoj Nayak writes: > Hi All, > > Is it not poosible to increase zfs record size beyond 128k.I am using > Solaris 10 Update 4. > > I get following error when I try to set zfs record size to 1024 k. > zfs set recordsize=1024k md9/test > cannot set property for ''md9/test'': ''recordsize'' must be power of 2 from > 512 to 128k > > Thanks > Manoj Nayak > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch - PAE wrote:> Why do you want greater than 128K records. >A single-parity RAID-Z pool on thumper is created & it consists of four disk.Solaris 10 update 4 runs on thumper.Then zfs filesystem is created in the pool.1 mb data is written to a file in filesystem using write (2) system call.However dtrace displays too many small sized physical disk read when the same 1 mb data is read using read() system call.recordsize is set to 128k. What''s going on here ? why so many small size block read is done ? if 0LARGE option need to be mentioned at the time of creating this file to tell , so that zfs use bigger block size. I think, zfs does use block size as per the following statements. ZFS files smaller than the recordsize are stored using a single filesystem block (FSB) of variable length in multiple of a disk sector (512 Bytes). Larger files are stored using multiple FSB, each of recordsize bytes, with default value of 128K. dtrace output : Event Device Device PathName RW Block Size Block No Offset Path sc-read . R 1052672 0 /mnt/bank0/media/CAD1/4.1 fop_read . R 1052672 0 /mnt/bank0/media/CAD1/4.1 disk_io sd6 /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 65536 50816 0 <none> disk_io sd21 /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 65536 50816 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 65536 50816 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 131072 47839 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 48095 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 43520 48352 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 43520 48523 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 48950 0 <none> disk_io sd48 /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 49121 0 <none> disk_io sd6 /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 131072 48096 0 <none> disk_io sd6 /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 48352 0 <none> disk_io sd21 /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 43520 48267 0 <none> disk_io sd21 /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 43520 48438 0 <none> disk_io sd6 /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 48523 0 <none> disk_io sd6 /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 43520 48951 0 <none> disk_io sd21 /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 87552 49891 0 <none> disk_io sd21 /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 44032 50062 0 <none> disk_io sd6 /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 49378 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 47668 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 47839 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 87552 48608 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 49036 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 131072 49207 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 50062 0 <none> disk_io sd13 /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 87552 49463 0 <none> Thanks Manoj Nayak> Do Check out : > http://blogs.sun.com/roch/entry/128k_suffice > > -r > > > Manoj Nayak writes: > > Hi All, > > > > Is it not poosible to increase zfs record size beyond 128k.I am using > > Solaris 10 Update 4. > > > > I get following error when I try to set zfs record size to 1024 k. > > zfs set recordsize=1024k md9/test > > cannot set property for ''md9/test'': ''recordsize'' must be power of 2 from > > 512 to 128k > > > > Thanks > > Manoj Nayak > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Manoj Nayak writes: > Roch - PAE wrote: > > Why do you want greater than 128K records. > > > A single-parity RAID-Z pool on thumper is created & it consists of four > disk.Solaris 10 update 4 runs on thumper.Then zfs filesystem is created in > the pool.1 mb data is written to a file in filesystem using write (2) > system call.However dtrace displays too many small sized physical disk > read > when the same 1 mb data is read using read() system call.recordsize is > set to 128k. > > What''s going on here ? why so many small size block read is done ? > Each record becomes it''s own raid-z stripe. When you read a 128K record it will need to issue a 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group. But with prefetching it''s possible that the I/O scheduler aggregate this to a higher value. It seems this is not happening on your setup. How large are the I/Os, if smaller than the above, is the pool rather filled up ? More reading : http://blogs.sun.com/roch/entry/when_to_and_not_to (raidz) -r > if 0LARGE option need to be mentioned at the time of creating this file > to tell , so that zfs use bigger block size. > > I think, zfs does use block size as per the following statements. > > ZFS files smaller than the recordsize are stored using a single > filesystem block (FSB) of variable length in multiple of a disk sector > (512 Bytes). > Larger files are stored using multiple FSB, each of recordsize bytes, > with default value of 128K. > > dtrace output : > > Event Device Device > PathName RW Block Size Block > No Offset Path > sc-read . R 1052672 0 /mnt/bank0/media/CAD1/4.1 > fop_read . R 1052672 0 /mnt/bank0/media/CAD1/4.1 > disk_io sd6 > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 65536 > 50816 0 <none> > disk_io sd21 > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 65536 > 50816 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 65536 > 50816 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 131072 > 47839 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 > 48095 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 43520 > 48352 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 43520 > 48523 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 > 48950 0 <none> > disk_io sd48 > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 > 49121 0 <none> > disk_io sd6 > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 131072 > 48096 0 <none> > disk_io sd6 > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 > 48352 0 <none> > disk_io sd21 > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 43520 > 48267 0 <none> > disk_io sd21 > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 43520 > 48438 0 <none> > disk_io sd6 > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 > 48523 0 <none> > disk_io sd6 > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 43520 > 48951 0 <none> > disk_io sd21 > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 87552 > 49891 0 <none> > disk_io sd21 > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 44032 > 50062 0 <none> > disk_io sd6 > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 > 49378 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > 47668 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > 47839 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 87552 > 48608 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > 49036 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 131072 > 49207 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > 50062 0 <none> > disk_io sd13 > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 87552 > 49463 0 <none> > > Thanks > Manoj Nayak > > > > Do Check out : > > http://blogs.sun.com/roch/entry/128k_suffice > > > > -r > > > > > > Manoj Nayak writes: > > > Hi All, > > > > > > Is it not poosible to increase zfs record size beyond 128k.I am using > > > Solaris 10 Update 4. > > > > > > I get following error when I try to set zfs record size to 1024 k. > > > zfs set recordsize=1024k md9/test > > > cannot set property for ''md9/test'': ''recordsize'' must be power of 2 from > > > 512 to 128k > > > > > > Thanks > > > Manoj Nayak > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch - PAE wrote:> Manoj Nayak writes: > > Roch - PAE wrote: > > > Why do you want greater than 128K records. > > > > > A single-parity RAID-Z pool on thumper is created & it consists of four > > disk.Solaris 10 update 4 runs on thumper.Then zfs filesystem is created in > > the pool.1 mb data is written to a file in filesystem using write (2) > > system call.However dtrace displays too many small sized physical disk > > read > > when the same 1 mb data is read using read() system call.recordsize is > > set to 128k. > > > > What''s going on here ? why so many small size block read is done ? > > > > Each record becomes it''s own raid-z stripe. > > When you read a 128K record it will need to issue a 128K/3 I/O > to each of the 3 data disks in the 4-disk raid-z group. But > with prefetching it''s possible that the I/O scheduler > aggregate this to a higher value. It seems this is not > happening on your setup. How large are the I/Os, if smaller > than the above, is the pool rather filled up ? >pools has plenty of space as suggested by zpool list command.I use one write() system call to write 1028k buffer. Then use one read() system call to read 1028k buffer. zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT md0 931G 5.09G 926G 0% ONLINE - md1 931G 190M 931G 0% ONLINE - md10 931G 225K 931G 0% ONLINE - md11 931G 225K 931G 0% ONLINE - md2 931G 164M 931G 0% ONLINE - md3 931G 164M 931G 0% ONLINE - md4 931G 225K 931G 0% ONLINE - md5 931G 225K 931G 0% ONLINE - md6 931G 225K 931G 0% ONLINE - md7 931G 225K 931G 0% ONLINE - md8 931G 225K 931G 0% ONLINE - md9 931G 225K 931G 0% ONLINE -> More reading : > http://blogs.sun.com/roch/entry/when_to_and_not_to (raidz) > > -r > > > > > if 0LARGE option need to be mentioned at the time of creating this file > > to tell , so that zfs use bigger block size. > > > > I think, zfs does use block size as per the following statements. > > > > ZFS files smaller than the recordsize are stored using a single > > filesystem block (FSB) of variable length in multiple of a disk sector > > (512 Bytes). > > Larger files are stored using multiple FSB, each of recordsize bytes, > > with default value of 128K. > > > > dtrace output : > > > > Event Device Device > > PathName RW Block Size Block > > No Offset Path > > sc-read . R 1052672 0 /mnt/bank0/media/CAD1/4.1 > > fop_read . R 1052672 0 /mnt/bank0/media/CAD1/4.1 > > disk_io sd6 > > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 65536 > > 50816 0 <none> > > disk_io sd21 > > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 65536 > > 50816 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 65536 > > 50816 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 131072 > > 47839 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 > > 48095 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 43520 > > 48352 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 43520 > > 48523 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 > > 48950 0 <none> > > disk_io sd48 > > /devices/pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 6,0:a R 87552 > > 49121 0 <none> > > disk_io sd6 > > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 131072 > > 48096 0 <none> > > disk_io sd6 > > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 > > 48352 0 <none> > > disk_io sd21 > > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 43520 > > 48267 0 <none> > > disk_io sd21 > > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 43520 > > 48438 0 <none> > > disk_io sd6 > > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 > > 48523 0 <none> > > disk_io sd6 > > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 43520 > > 48951 0 <none> > > disk_io sd21 > > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 87552 > > 49891 0 <none> > > disk_io sd21 > > /devices/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 4,0:a R 44032 > > 50062 0 <none> > > disk_io sd6 > > /devices/pci at 1,0/pci1022,7458 at 3/pci11ab,11ab at 1/disk at 1,0:a R 87552 > > 49378 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > > 47668 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > > 47839 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 87552 > > 48608 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > > 49036 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 131072 > > 49207 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 43520 > > 50062 0 <none> > > disk_io sd13 > > /devices/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 3,0:a R 87552 > > 49463 0 <none> > > > > Thanks > > Manoj Nayak > > > > > > > Do Check out : > > > http://blogs.sun.com/roch/entry/128k_suffice > > > > > > -r > > > > > > > > > Manoj Nayak writes: > > > > Hi All, > > > > > > > > Is it not poosible to increase zfs record size beyond 128k.I am using > > > > Solaris 10 Update 4. > > > > > > > > I get following error when I try to set zfs record size to 1024 k. > > > > zfs set recordsize=1024k md9/test > > > > cannot set property for ''md9/test'': ''recordsize'' must be power of 2 from > > > > 512 to 128k > > > > > > > > Thanks > > > > Manoj Nayak > > > > _______________________________________________ > > > > zfs-discuss mailing list > > > > zfs-discuss at opensolaris.org > > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >