Robert Milkowski
2007-Oct-18 14:55 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
Hi, snv_74, x4500, 48x 500GB, 16GB RAM, 2x dual core # zpool create test c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0 [46x 500GB] # ls -lh /test/q1 -rw-r--r-- 1 root root 82G Oct 18 09:43 /test/q1 # dd if=/test/q1 of=/dev/null bs=16384k & # zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 213G 20.6T 645 120 80.1M 14.7M test 213G 20.6T 9.26K 0 1.16G 0 test 213G 20.6T 9.66K 0 1.21G 0 test 213G 20.6T 9.41K 0 1.18G 0 test 213G 20.6T 9.41K 0 1.18G 0 test 213G 20.6T 7.45K 0 953M 0 test 213G 20.6T 7.59K 0 971M 0 test 213G 20.6T 7.41K 0 948M 0 test 213G 20.6T 8.25K 0 1.03G 0 test 213G 20.6T 9.17K 0 1.15G 0 test 213G 20.6T 9.54K 0 1.19G 0 test 213G 20.6T 9.89K 0 1.24G 0 test 213G 20.6T 9.41K 0 1.18G 0 test 213G 20.6T 9.31K 0 1.16G 0 test 213G 20.6T 9.80K 0 1.22G 0 test 213G 20.6T 8.72K 0 1.09G 0 test 213G 20.6T 7.86K 0 1006M 0 test 213G 20.6T 7.21K 0 923M 0 test 213G 20.6T 7.62K 0 975M 0 test 213G 20.6T 8.68K 0 1.08G 0 test 213G 20.6T 9.81K 0 1.23G 0 test 213G 20.6T 9.57K 0 1.20G 0 So it''s around 1GB/s. # dd if=/dev/zero of=/test/q10 bs=128k & # zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- test 223G 20.6T 656 170 81.5M 20.8M test 223G 20.6T 0 8.10K 0 1021M test 223G 20.6T 0 7.94K 0 1001M test 216G 20.6T 0 6.53K 0 812M test 216G 20.6T 0 7.19K 0 906M test 216G 20.6T 0 6.78K 0 854M test 216G 20.6T 0 7.88K 0 993M test 216G 20.6T 0 10.3K 0 1.27G test 222G 20.6T 0 8.61K 0 1.04G test 222G 20.6T 0 7.30K 0 919M test 222G 20.6T 0 8.16K 0 1.00G test 222G 20.6T 0 8.82K 0 1.09G test 225G 20.6T 0 4.19K 0 511M test 225G 20.6T 0 10.2K 0 1.26G test 225G 20.6T 0 9.15K 0 1.13G test 225G 20.6T 0 8.46K 0 1.04G test 225G 20.6T 0 8.48K 0 1.04G test 225G 20.6T 0 10.9K 0 1.33G test 231G 20.6T 0 3 0 3.96K test 231G 20.6T 0 0 0 0 test 231G 20.6T 0 0 0 0 test 231G 20.6T 0 9.02K 0 1.11G test 231G 20.6T 0 12.2K 0 1.50G test 231G 20.6T 0 9.14K 0 1.13G test 231G 20.6T 0 10.3K 0 1.27G test 231G 20.6T 0 9.08K 0 1.10G test 237G 20.6T 0 0 0 0 test 237G 20.6T 0 0 0 0 test 237G 20.6T 0 6.03K 0 760M test 237G 20.6T 0 9.18K 0 1.13G test 237G 20.6T 0 8.40K 0 1.03G test 237G 20.6T 0 8.45K 0 1.04G test 237G 20.6T 0 11.1K 0 1.36G Well, writing could be faster than reading here... there''re gaps due to bug 6415647 I guess. # zpool destroy test # metainit d100 1 46 c0t0d0s0 c0t1d0s0 c0t2d0s0 c0t3d0s0 c0t4d0s0 c0t5d0s0 c0t6d0s0 c0t7d0s0 c1t0d0s0 c1t1d0s0 c1t2d0s0 c1t3d0s0 c1t4d0s0 c1t5d0s0 c1t6d0s0 c1t7d0s0 c4t0d0s0 c4t1d0s0 c4t2d0s0 c4t3d0s0 c4t4d0s0 c4t5d0s0 c4t6d0s0 c4t7d0s0 c5t1d0s0 c5t2d0s0 c5t3d0s0 c5t5d0s0 c5t6d0s0 c5t7d0s0 c6t0d0s0 c6t1d0s0 c6t2d0s0 c6t3d0s0 c6t4d0s0 c6t5d0s0 c6t6d0s0 c6t7d0s0 c7t0d0s0 c7t1d0s0 c7t2d0s0 c7t3d0s0 c7t4d0s0 c7t5d0s0 c7t6d0s0 c7t7d0s0 -i 128k d100: Concat/Stripe is setup [46x 500GB] And I get not so good results - maximum 1GB of reading... hmmmm... maxphys is 56K - I thought it was increased some time ago on x86! Still no performance increase. # metainit d101 -r c0t0d0s0 c1t0d0s0 c4t0d0s0 c6t0d0s0 c7t0d0s0 -i 128k # metainit d102 -r c0t1d0s0 c1t1d0s0 c5t1d0s0 c6t1d0s0 c7t1d0s0 -i 128k # metainit d103 -r c0t2d0s0 c1t2d0s0 c5t2d0s0 c6t2d0s0 c7t2d0s0 -i 128k # metainit d104 -r c0t4d0s0 c1t4d0s0 c4t4d0s0 c6t4d0s0 c7t4d0s0 -i 128k # metainit d105 -r c0t3d0s0 c1t3d0s0 c4t3d0s0 c5t3d0s0 c6t3d0s0 c7t3d0s0 -i 128k # metainit d106 -r c0t5d0s0 c1t5d0s0 c4t5d0s0 c5t5d0s0 c6t5d0s0 c7t5d0s0 -i 128k # metainit d107 -r c0t6d0s0 c1t6d0s0 c4t6d0s0 c5t6d0s0 c6t6d0s0 c7t6d0s0 -i 128k # metainit d108 -r c0t7d0s0 c1t7d0s0 c4t7d0s0 c5t7d0s0 c6t7d0s0 c7t7d0s0 -i 128k # iostat -xnzCM 1 | egrep "device| c[0-7]$" [...] extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.0 362.0 0.0 362.0 0.0 7.0 0.0 19.3 0 698 c0 0.0 377.0 0.0 377.0 0.0 7.0 0.0 18.5 0 698 c1 0.0 320.0 0.0 320.0 0.0 6.0 0.0 18.7 0 598 c4 0.0 268.0 0.0 268.0 0.0 5.0 0.0 18.6 0 499 c5 0.0 372.0 0.0 372.0 0.0 7.0 0.0 18.8 0 698 c6 0.0 374.0 0.0 374.0 0.0 7.0 0.0 18.7 0 698 c7 Sometimes I get even more - around 2.3GB/s The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else? This message posted from opensolaris.org
Robert Milkowski
2007-Oct-18 16:16 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
zpool create t1 c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0 zpool create t2 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 zpool create t3 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 zpool create t4 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0 zpool create t5 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 zpool create t6 c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0 zfs set atime=off t1 zfs set atime=off t2 zfs set atime=off t3 zfs set atime=off t4 zfs set atime=off t5 zfs set atime=off t6 dd if=/dev/zero of=/t1/q1 bs=512k& [1] 903 dd if=/dev/zero of=/t2/q1 bs=512k& [2] 908 dd if=/dev/zero of=/t3/q1 bs=512k& [3] 909 dd if=/dev/zero of=/t4/q1 bs=512k& [4] 910 dd if=/dev/zero of=/t5/q1 bs=512k& [5] 911 dd if=/dev/zero of=/t6/q1 bs=512k& [6] 912 zpool iostat 1 [...] capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- t1 20.1G 3.61T 0 3.19K 0 405M t2 12.9G 3.61T 0 2.38K 0 302M t3 8.51G 3.62T 0 2.79K 63.4K 357M t4 5.19G 2.71T 0 1.39K 63.4K 170M t5 1.96G 3.62T 0 2.65K 0 336M t6 1.29G 3.62T 0 1.05K 63.4K 127M ---------- ----- ----- ----- ----- ----- ----- t1 20.1G 3.61T 0 3.77K 0 483M t2 12.9G 3.61T 0 3.49K 0 446M t3 8.51G 3.62T 0 2.36K 63.3K 295M t4 5.19G 2.71T 0 2.84K 0 359M t5 2.29G 3.62T 0 97 62.7K 494K t6 1.29G 3.62T 0 4.03K 0 510M ---------- ----- ----- ----- ----- ----- ----- iostat -xnzCM 1 | egrep "device| c[0-7]$" [...] extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.0 5277.8 0.0 659.7 0.6 120.2 0.1 22.8 1 646 c0 0.0 5625.7 0.0 703.2 0.1 116.7 0.0 20.7 0 691 c1 0.0 4806.7 0.0 599.4 0.0 83.9 0.0 17.4 0 582 c4 0.0 2457.4 0.0 307.2 3.3 134.9 1.3 54.9 2 600 c5 0.0 3882.8 0.0 485.3 0.4 157.1 0.1 40.5 0 751 c7 So right now I''m getting up-to 2,7GB/s. It''s still jumpy (I provided only peak outputs) but it''s much better than one large pool - lets try again: # zpool create test c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0 zfs set atime=off test dd if=/dev/zero of=/test/q1 bs=512k& dd if=/dev/zero of=/test/q2 bs=512k& dd if=/dev/zero of=/test/q3 bs=512k& dd if=/dev/zero of=/test/q4 bs=512k& dd if=/dev/zero of=/test/q5 bs=512k& dd if=/dev/zero of=/test/q6 bs=512k& iostat -xnzCM 1 | egrep "device| c[0-7]$" [...] extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.0 1891.9 0.0 233.0 11.7 13.5 6.2 7.1 3 374 c0 0.0 1944.9 0.0 239.5 10.9 14.0 5.6 7.2 3 350 c1 7.0 1897.9 0.1 233.0 11.3 13.3 5.9 7.0 3 339 c4 13.0 1455.9 0.2 178.5 13.2 6.1 9.0 4.2 3 226 c5 0.0 1921.9 0.0 236.0 8.1 10.7 4.2 5.5 2 322 c6 0.0 1919.9 0.0 236.0 7.8 10.5 4.1 5.5 2 321 c7 So it''s about 1.3GB/s - about half of what I get with more pools. Looks like a problem with scalability of one pool. This message posted from opensolaris.org
Mario Goebbels
2007-Oct-19 16:37 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
> The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else?Remember that ZFS is checksumming everything on reads and writes. -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 648 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071019/3e21a16d/attachment.bin>
Robert Milkowski
2007-Oct-19 18:11 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
Hello Mario, Friday, October 19, 2007, 5:37:07 PM, you wrote:>> The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else?MG> Remember that ZFS is checksumming everything on reads and writes. I know - but if I divide that pool into 6 smaler ones (see my 2nd post) that the performance is much better. So it''s not that system is not able to cope because of checksums it''s rather that I can''t get that kind of a performance from single pool. I know it''s a "dd test" but anyway... -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Roch - PAE
2007-Oct-24 15:49 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
I would suspect the checksum part of this (I do believe it''s being actively worked on) : 6533726 single-threaded checksum & raidz2 parity calculations limit write bandwidth on thumper -r Robert Milkowski writes: > Hi, > > snv_74, x4500, 48x 500GB, 16GB RAM, 2x dual core > > # zpool create test c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0 > [46x 500GB] > > # ls -lh /test/q1 > -rw-r--r-- 1 root root 82G Oct 18 09:43 /test/q1 > > # dd if=/test/q1 of=/dev/null bs=16384k & > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 213G 20.6T 645 120 80.1M 14.7M > test 213G 20.6T 9.26K 0 1.16G 0 > test 213G 20.6T 9.66K 0 1.21G 0 > test 213G 20.6T 9.41K 0 1.18G 0 > test 213G 20.6T 9.41K 0 1.18G 0 > test 213G 20.6T 7.45K 0 953M 0 > test 213G 20.6T 7.59K 0 971M 0 > test 213G 20.6T 7.41K 0 948M 0 > test 213G 20.6T 8.25K 0 1.03G 0 > test 213G 20.6T 9.17K 0 1.15G 0 > test 213G 20.6T 9.54K 0 1.19G 0 > test 213G 20.6T 9.89K 0 1.24G 0 > test 213G 20.6T 9.41K 0 1.18G 0 > test 213G 20.6T 9.31K 0 1.16G 0 > test 213G 20.6T 9.80K 0 1.22G 0 > test 213G 20.6T 8.72K 0 1.09G 0 > test 213G 20.6T 7.86K 0 1006M 0 > test 213G 20.6T 7.21K 0 923M 0 > test 213G 20.6T 7.62K 0 975M 0 > test 213G 20.6T 8.68K 0 1.08G 0 > test 213G 20.6T 9.81K 0 1.23G 0 > test 213G 20.6T 9.57K 0 1.20G 0 > > So it''s around 1GB/s. > > # dd if=/dev/zero of=/test/q10 bs=128k & > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 223G 20.6T 656 170 81.5M 20.8M > test 223G 20.6T 0 8.10K 0 1021M > test 223G 20.6T 0 7.94K 0 1001M > test 216G 20.6T 0 6.53K 0 812M > test 216G 20.6T 0 7.19K 0 906M > test 216G 20.6T 0 6.78K 0 854M > test 216G 20.6T 0 7.88K 0 993M > test 216G 20.6T 0 10.3K 0 1.27G > test 222G 20.6T 0 8.61K 0 1.04G > test 222G 20.6T 0 7.30K 0 919M > test 222G 20.6T 0 8.16K 0 1.00G > test 222G 20.6T 0 8.82K 0 1.09G > test 225G 20.6T 0 4.19K 0 511M > test 225G 20.6T 0 10.2K 0 1.26G > test 225G 20.6T 0 9.15K 0 1.13G > test 225G 20.6T 0 8.46K 0 1.04G > test 225G 20.6T 0 8.48K 0 1.04G > test 225G 20.6T 0 10.9K 0 1.33G > test 231G 20.6T 0 3 0 3.96K > test 231G 20.6T 0 0 0 0 > test 231G 20.6T 0 0 0 0 > test 231G 20.6T 0 9.02K 0 1.11G > test 231G 20.6T 0 12.2K 0 1.50G > test 231G 20.6T 0 9.14K 0 1.13G > test 231G 20.6T 0 10.3K 0 1.27G > test 231G 20.6T 0 9.08K 0 1.10G > test 237G 20.6T 0 0 0 0 > test 237G 20.6T 0 0 0 0 > test 237G 20.6T 0 6.03K 0 760M > test 237G 20.6T 0 9.18K 0 1.13G > test 237G 20.6T 0 8.40K 0 1.03G > test 237G 20.6T 0 8.45K 0 1.04G > test 237G 20.6T 0 11.1K 0 1.36G > > Well, writing could be faster than reading here... there''re gaps due to bug 6415647 I guess. > > > # zpool destroy test > > # metainit d100 1 46 c0t0d0s0 c0t1d0s0 c0t2d0s0 c0t3d0s0 c0t4d0s0 c0t5d0s0 c0t6d0s0 c0t7d0s0 c1t0d0s0 c1t1d0s0 c1t2d0s0 c1t3d0s0 c1t4d0s0 c1t5d0s0 c1t6d0s0 c1t7d0s0 c4t0d0s0 c4t1d0s0 c4t2d0s0 c4t3d0s0 c4t4d0s0 c4t5d0s0 c4t6d0s0 c4t7d0s0 c5t1d0s0 c5t2d0s0 c5t3d0s0 c5t5d0s0 c5t6d0s0 c5t7d0s0 c6t0d0s0 c6t1d0s0 c6t2d0s0 c6t3d0s0 c6t4d0s0 c6t5d0s0 c6t6d0s0 c6t7d0s0 c7t0d0s0 c7t1d0s0 c7t2d0s0 c7t3d0s0 c7t4d0s0 c7t5d0s0 c7t6d0s0 c7t7d0s0 -i 128k > d100: Concat/Stripe is setup > [46x 500GB] > > And I get not so good results - maximum 1GB of reading... hmmmm... > > maxphys is 56K - I thought it was increased some time ago on x86! > > Still no performance increase. > > # metainit d101 -r c0t0d0s0 c1t0d0s0 c4t0d0s0 c6t0d0s0 c7t0d0s0 -i 128k > # metainit d102 -r c0t1d0s0 c1t1d0s0 c5t1d0s0 c6t1d0s0 c7t1d0s0 -i 128k > # metainit d103 -r c0t2d0s0 c1t2d0s0 c5t2d0s0 c6t2d0s0 c7t2d0s0 -i 128k > # metainit d104 -r c0t4d0s0 c1t4d0s0 c4t4d0s0 c6t4d0s0 c7t4d0s0 -i 128k > # metainit d105 -r c0t3d0s0 c1t3d0s0 c4t3d0s0 c5t3d0s0 c6t3d0s0 c7t3d0s0 -i 128k > # metainit d106 -r c0t5d0s0 c1t5d0s0 c4t5d0s0 c5t5d0s0 c6t5d0s0 c7t5d0s0 -i 128k > # metainit d107 -r c0t6d0s0 c1t6d0s0 c4t6d0s0 c5t6d0s0 c6t6d0s0 c7t6d0s0 -i 128k > # metainit d108 -r c0t7d0s0 c1t7d0s0 c4t7d0s0 c5t7d0s0 c6t7d0s0 c7t7d0s0 -i 128k > > # iostat -xnzCM 1 | egrep "device| c[0-7]$" > [...] > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 0.0 362.0 0.0 362.0 0.0 7.0 0.0 19.3 0 698 c0 > 0.0 377.0 0.0 377.0 0.0 7.0 0.0 18.5 0 698 c1 > 0.0 320.0 0.0 320.0 0.0 6.0 0.0 18.7 0 598 c4 > 0.0 268.0 0.0 268.0 0.0 5.0 0.0 18.6 0 499 c5 > 0.0 372.0 0.0 372.0 0.0 7.0 0.0 18.8 0 698 c6 > 0.0 374.0 0.0 374.0 0.0 7.0 0.0 18.7 0 698 c7 > > Sometimes I get even more - around 2.3GB/s > > The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2007-Oct-29 17:42 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
Hello Roch, Wednesday, October 24, 2007, 3:49:45 PM, you wrote: RP> I would suspect the checksum part of this (I do believe it''s being RP> actively worked on) : RP> 6533726 single-threaded checksum & raidz2 parity RP> calculations limit write bandwidth on thumper I guess it''s single threaded per pool - that''s why once I created multiple pool the performance was much better. Thanks for info. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com