Glinty McFrikknuts
2008-Feb-14 23:05 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
I''m running on s10s_u4wos_12b and doing the following test. Create a pool, striped across 4 physical disks from a storage array. Write a 100GB file to the filesystem (dd from /dev/zero out to the file). Run I/O against that file, doing 100% random writes with an 8K block size. zpool iostat shows the following... capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- testpl 305G 92.8G 10 83 1.25M 8.24M testpl 305G 92.8G 573 1.16K 69.4M 59.6M testpl 305G 92.8G 573 1.24K 70.2M 74.8M testpl 305G 92.8G 600 729 72.0M 31.3M testpl 305G 92.8G 448 1.23K 54.3M 70.1M testpl 305G 92.8G 576 1.39K 70.1M 76.3M The I/O stats from the array show the same. Running truss against the I/O tool shows that it is only doing writes... /18: pwrite64(6, "12 4 V x\003E9BD ODB7F F".., 8192, 0x00000006FB4FC000) = 8192 /9: pwrite64(6, "12 4 V x\003DCD9EB9F8CCA".., 8192, 0x0000000D3AFC6000) = 8192 /10: pwrite64(6, "12 4 V x\003DFC502 :AD ^".., 8192, 0x000000075ABF0000) = 8192 /12: pwrite64(6, "12 4 V x\003E09D\bC5\0E6".., 8192, 0x0000000CF8A9A000) = 8192 /11: pwrite64(6, "12 4 V x\003DFFD03ECA006".., 8192, 0xDD1C8000) = 8192 /5: pwrite64(6, "12 4 V x\003D7 eC19CA5 >".., 8192, 0x49E92000) = 8192 /8: pwrite64(6, "12 4 V x\003DB +DEB0 *BA".., 8192, 0x000000074FCB2000) = 8192 /4: pwrite64(6, "12 4 V x\003D6\rB7 K92B6".., 8192, 0x0000000295E1C000) = 8192 /3: pwrite64(6, "12 4 V x\003D5 }B2FB1486".., 8192, 0x000000118B862000) = 8192 /14: pwrite64(6, "12 4 V x\003E599 /84FB\n".., 8192, 0x00000003DFCD4000) = 8192 /6: pwrite64(6, "12 4 V x\003DA9DDA a9EE6".., 8192, 0x000000105DA36000) = 8192 /17: pwrite64(6, "12 4 V x\003E7F5 AEDF8 n".., 8192, 0x160CC000) = 8192 Is this normal? Thanks This message posted from opensolaris.org
Anton B. Rang
2008-Feb-15 04:02 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
> Create a pool [ ... ] > Write a 100GB file to the filesystem [ ... ] > Run I/O against that file, doing 100% random writes with an 8K block size.Did you set the record size of the filesystem to 8K? If not, each 8K write will first read 128K, then write 128K. Anton This message posted from opensolaris.org
Richard Elling
2008-Feb-15 05:42 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Anton B. Rang wrote:>> Create a pool [ ... ] >> Write a 100GB file to the filesystem [ ... ] >> Run I/O against that file, doing 100% random writes with an 8K block size. >> > > Did you set the record size of the filesystem to 8K? > > If not, each 8K write will first read 128K, then write 128K. >Also check to see that your 8kByte random writes are aligned on 8kByte boundaries, otherwise you''ll be doing a read-modify-write. -- richard
Nathan Kroenert
2008-Feb-15 05:48 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
And something I was told only recently - It makes a difference if you created the file *before* you set the recordsize property. If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it''ll keep that forever... Assuming I understand correctly. Hopefully someone else on the list will be able to confirm. Cheers! Nathan. Richard Elling wrote:> Anton B. Rang wrote: >>> Create a pool [ ... ] >>> Write a 100GB file to the filesystem [ ... ] >>> Run I/O against that file, doing 100% random writes with an 8K block size. >>> >> Did you set the record size of the filesystem to 8K? >> >> If not, each 8K write will first read 128K, then write 128K. >> > > Also check to see that your 8kByte random writes are aligned on 8kByte > boundaries, otherwise you''ll be doing a read-modify-write. > -- richard > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Neil Perrin
2008-Feb-15 15:21 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Nathan Kroenert wrote:> And something I was told only recently - It makes a difference if you > created the file *before* you set the recordsize property. > > If you created them after, then no worries, but if I understand > correctly, if the *file* was created with 128K recordsize, then it''ll > keep that forever... > > Assuming I understand correctly. > > Hopefully someone else on the list will be able to confirm.Yes, that is correct. Neil.
Richard Elling
2008-Feb-15 22:48 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Nathan Kroenert wrote:> And something I was told only recently - It makes a difference if you > created the file *before* you set the recordsize property.Actually, it has always been true for RAID-0, RAID-5, RAID-6. If your I/O strides over two sets then you end up doing more I/O, perhaps twice as much.> > If you created them after, then no worries, but if I understand > correctly, if the *file* was created with 128K recordsize, then it''ll > keep that forever...Files have nothing to do with it. The recordsize is a file system parameter. It gets a little more complicated because the recordsize is actually the maximum recordsize, not the minimum. -- richard
Mattias Pantzare
2008-Feb-16 00:34 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
> > > > If you created them after, then no worries, but if I understand > > correctly, if the *file* was created with 128K recordsize, then it''ll > > keep that forever... > > > Files have nothing to do with it. The recordsize is a file system > parameter. It gets a little more complicated because the recordsize > is actually the maximum recordsize, not the minimum.Please read the manpage: Changing the file system''s recordsize only affects files created afterward; existing files are unaffected. Nothing is rewritten in the file system when you change recordsize so is stays the same for existing files.
Nathan Kroenert
2008-Feb-16 02:20 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
What about new blocks written to an existing file? Perhaps we could make that clearer in the manpage too... hm. Mattias Pantzare wrote:>> > >> > If you created them after, then no worries, but if I understand >> > correctly, if the *file* was created with 128K recordsize, then it''ll >> > keep that forever... >> >> >> Files have nothing to do with it. The recordsize is a file system >> parameter. It gets a little more complicated because the recordsize >> is actually the maximum recordsize, not the minimum. > > Please read the manpage: > > Changing the file system''s recordsize only affects files > created afterward; existing files are unaffected. > > Nothing is rewritten in the file system when you change recordsize so > is stays the same for existing files. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Nathan Kroenert
2008-Feb-16 02:23 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Hey, Richard - I''m confused now. My understanding was that any files created after the recordsize was set would use that as the new maximum recordsize, but files already created would continue to use the old recordsize. Though I''m now a little hazy on what will happen when the new existing files are updated as well... hm. Cheers! Nathan. Richard Elling wrote:> Nathan Kroenert wrote: >> And something I was told only recently - It makes a difference if you >> created the file *before* you set the recordsize property. > > Actually, it has always been true for RAID-0, RAID-5, RAID-6. > If your I/O strides over two sets then you end up doing more I/O, > perhaps twice as much. > >> >> If you created them after, then no worries, but if I understand >> correctly, if the *file* was created with 128K recordsize, then it''ll >> keep that forever... > > Files have nothing to do with it. The recordsize is a file system > parameter. It gets a little more complicated because the recordsize > is actually the maximum recordsize, not the minimum. > -- richard >
Glinty McFrikknuts
2008-Feb-19 19:19 UTC
[zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Thanks for the suggestions. I re-created the pool, set the record size to 8K, re-created the file and increased the I/O size from the application. It''s nearly all writes now. This message posted from opensolaris.org