Robert Milkowski
2007-Oct-18 14:55 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
Hi,
snv_74, x4500, 48x 500GB, 16GB RAM, 2x dual core
# zpool create test c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0
c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c4t0d0 c4t1d0 c4t2d0
c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0
c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 c7t0d0 c7t1d0 c7t2d0
c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0
[46x 500GB]
# ls -lh /test/q1
-rw-r--r-- 1 root root 82G Oct 18 09:43 /test/q1
# dd if=/test/q1 of=/dev/null bs=16384k &
# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 213G 20.6T 645 120 80.1M 14.7M
test 213G 20.6T 9.26K 0 1.16G 0
test 213G 20.6T 9.66K 0 1.21G 0
test 213G 20.6T 9.41K 0 1.18G 0
test 213G 20.6T 9.41K 0 1.18G 0
test 213G 20.6T 7.45K 0 953M 0
test 213G 20.6T 7.59K 0 971M 0
test 213G 20.6T 7.41K 0 948M 0
test 213G 20.6T 8.25K 0 1.03G 0
test 213G 20.6T 9.17K 0 1.15G 0
test 213G 20.6T 9.54K 0 1.19G 0
test 213G 20.6T 9.89K 0 1.24G 0
test 213G 20.6T 9.41K 0 1.18G 0
test 213G 20.6T 9.31K 0 1.16G 0
test 213G 20.6T 9.80K 0 1.22G 0
test 213G 20.6T 8.72K 0 1.09G 0
test 213G 20.6T 7.86K 0 1006M 0
test 213G 20.6T 7.21K 0 923M 0
test 213G 20.6T 7.62K 0 975M 0
test 213G 20.6T 8.68K 0 1.08G 0
test 213G 20.6T 9.81K 0 1.23G 0
test 213G 20.6T 9.57K 0 1.20G 0
So it''s around 1GB/s.
# dd if=/dev/zero of=/test/q10 bs=128k &
# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
test 223G 20.6T 656 170 81.5M 20.8M
test 223G 20.6T 0 8.10K 0 1021M
test 223G 20.6T 0 7.94K 0 1001M
test 216G 20.6T 0 6.53K 0 812M
test 216G 20.6T 0 7.19K 0 906M
test 216G 20.6T 0 6.78K 0 854M
test 216G 20.6T 0 7.88K 0 993M
test 216G 20.6T 0 10.3K 0 1.27G
test 222G 20.6T 0 8.61K 0 1.04G
test 222G 20.6T 0 7.30K 0 919M
test 222G 20.6T 0 8.16K 0 1.00G
test 222G 20.6T 0 8.82K 0 1.09G
test 225G 20.6T 0 4.19K 0 511M
test 225G 20.6T 0 10.2K 0 1.26G
test 225G 20.6T 0 9.15K 0 1.13G
test 225G 20.6T 0 8.46K 0 1.04G
test 225G 20.6T 0 8.48K 0 1.04G
test 225G 20.6T 0 10.9K 0 1.33G
test 231G 20.6T 0 3 0 3.96K
test 231G 20.6T 0 0 0 0
test 231G 20.6T 0 0 0 0
test 231G 20.6T 0 9.02K 0 1.11G
test 231G 20.6T 0 12.2K 0 1.50G
test 231G 20.6T 0 9.14K 0 1.13G
test 231G 20.6T 0 10.3K 0 1.27G
test 231G 20.6T 0 9.08K 0 1.10G
test 237G 20.6T 0 0 0 0
test 237G 20.6T 0 0 0 0
test 237G 20.6T 0 6.03K 0 760M
test 237G 20.6T 0 9.18K 0 1.13G
test 237G 20.6T 0 8.40K 0 1.03G
test 237G 20.6T 0 8.45K 0 1.04G
test 237G 20.6T 0 11.1K 0 1.36G
Well, writing could be faster than reading here... there''re gaps due to
bug 6415647 I guess.
# zpool destroy test
# metainit d100 1 46 c0t0d0s0 c0t1d0s0 c0t2d0s0 c0t3d0s0 c0t4d0s0 c0t5d0s0
c0t6d0s0 c0t7d0s0 c1t0d0s0 c1t1d0s0 c1t2d0s0 c1t3d0s0 c1t4d0s0 c1t5d0s0 c1t6d0s0
c1t7d0s0 c4t0d0s0 c4t1d0s0 c4t2d0s0 c4t3d0s0 c4t4d0s0 c4t5d0s0 c4t6d0s0 c4t7d0s0
c5t1d0s0 c5t2d0s0 c5t3d0s0 c5t5d0s0 c5t6d0s0 c5t7d0s0 c6t0d0s0 c6t1d0s0 c6t2d0s0
c6t3d0s0 c6t4d0s0 c6t5d0s0 c6t6d0s0 c6t7d0s0 c7t0d0s0 c7t1d0s0 c7t2d0s0 c7t3d0s0
c7t4d0s0 c7t5d0s0 c7t6d0s0 c7t7d0s0 -i 128k
d100: Concat/Stripe is setup
[46x 500GB]
And I get not so good results - maximum 1GB of reading... hmmmm...
maxphys is 56K - I thought it was increased some time ago on x86!
Still no performance increase.
# metainit d101 -r c0t0d0s0 c1t0d0s0 c4t0d0s0 c6t0d0s0 c7t0d0s0 -i 128k
# metainit d102 -r c0t1d0s0 c1t1d0s0 c5t1d0s0 c6t1d0s0 c7t1d0s0 -i 128k
# metainit d103 -r c0t2d0s0 c1t2d0s0 c5t2d0s0 c6t2d0s0 c7t2d0s0 -i 128k
# metainit d104 -r c0t4d0s0 c1t4d0s0 c4t4d0s0 c6t4d0s0 c7t4d0s0 -i 128k
# metainit d105 -r c0t3d0s0 c1t3d0s0 c4t3d0s0 c5t3d0s0 c6t3d0s0 c7t3d0s0 -i 128k
# metainit d106 -r c0t5d0s0 c1t5d0s0 c4t5d0s0 c5t5d0s0 c6t5d0s0 c7t5d0s0 -i 128k
# metainit d107 -r c0t6d0s0 c1t6d0s0 c4t6d0s0 c5t6d0s0 c6t6d0s0 c7t6d0s0 -i 128k
# metainit d108 -r c0t7d0s0 c1t7d0s0 c4t7d0s0 c5t7d0s0 c6t7d0s0 c7t7d0s0 -i 128k
# iostat -xnzCM 1 | egrep "device| c[0-7]$"
[...]
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 362.0 0.0 362.0 0.0 7.0 0.0 19.3 0 698 c0
0.0 377.0 0.0 377.0 0.0 7.0 0.0 18.5 0 698 c1
0.0 320.0 0.0 320.0 0.0 6.0 0.0 18.7 0 598 c4
0.0 268.0 0.0 268.0 0.0 5.0 0.0 18.6 0 499 c5
0.0 372.0 0.0 372.0 0.0 7.0 0.0 18.8 0 698 c6
0.0 374.0 0.0 374.0 0.0 7.0 0.0 18.7 0 698 c7
Sometimes I get even more - around 2.3GB/s
The question is - why I can''t get that kind of performance with single
zfs pool (striping accross all te disks)? Concurrency problem or something else?
This message posted from opensolaris.org
Robert Milkowski
2007-Oct-18 16:16 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
zpool create t1 c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0
zpool create t2 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0
zpool create t3 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0
zpool create t4 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0
zpool create t5 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0
zpool create t6 c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0
zfs set atime=off t1
zfs set atime=off t2
zfs set atime=off t3
zfs set atime=off t4
zfs set atime=off t5
zfs set atime=off t6
dd if=/dev/zero of=/t1/q1 bs=512k&
[1] 903
dd if=/dev/zero of=/t2/q1 bs=512k&
[2] 908
dd if=/dev/zero of=/t3/q1 bs=512k&
[3] 909
dd if=/dev/zero of=/t4/q1 bs=512k&
[4] 910
dd if=/dev/zero of=/t5/q1 bs=512k&
[5] 911
dd if=/dev/zero of=/t6/q1 bs=512k&
[6] 912
zpool iostat 1
[...]
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
t1 20.1G 3.61T 0 3.19K 0 405M
t2 12.9G 3.61T 0 2.38K 0 302M
t3 8.51G 3.62T 0 2.79K 63.4K 357M
t4 5.19G 2.71T 0 1.39K 63.4K 170M
t5 1.96G 3.62T 0 2.65K 0 336M
t6 1.29G 3.62T 0 1.05K 63.4K 127M
---------- ----- ----- ----- ----- ----- -----
t1 20.1G 3.61T 0 3.77K 0 483M
t2 12.9G 3.61T 0 3.49K 0 446M
t3 8.51G 3.62T 0 2.36K 63.3K 295M
t4 5.19G 2.71T 0 2.84K 0 359M
t5 2.29G 3.62T 0 97 62.7K 494K
t6 1.29G 3.62T 0 4.03K 0 510M
---------- ----- ----- ----- ----- ----- -----
iostat -xnzCM 1 | egrep "device| c[0-7]$"
[...]
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 5277.8 0.0 659.7 0.6 120.2 0.1 22.8 1 646 c0
0.0 5625.7 0.0 703.2 0.1 116.7 0.0 20.7 0 691 c1
0.0 4806.7 0.0 599.4 0.0 83.9 0.0 17.4 0 582 c4
0.0 2457.4 0.0 307.2 3.3 134.9 1.3 54.9 2 600 c5
0.0 3882.8 0.0 485.3 0.4 157.1 0.1 40.5 0 751 c7
So right now I''m getting up-to 2,7GB/s.
It''s still jumpy (I provided only peak outputs) but it''s much
better than one large pool - lets try again:
# zpool create test c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0
c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c4t0d0 c4t1d0 c4t2d0
c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0
c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 c7t0d0 c7t1d0 c7t2d0
c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0
zfs set atime=off test
dd if=/dev/zero of=/test/q1 bs=512k&
dd if=/dev/zero of=/test/q2 bs=512k&
dd if=/dev/zero of=/test/q3 bs=512k&
dd if=/dev/zero of=/test/q4 bs=512k&
dd if=/dev/zero of=/test/q5 bs=512k&
dd if=/dev/zero of=/test/q6 bs=512k&
iostat -xnzCM 1 | egrep "device| c[0-7]$"
[...]
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 1891.9 0.0 233.0 11.7 13.5 6.2 7.1 3 374 c0
0.0 1944.9 0.0 239.5 10.9 14.0 5.6 7.2 3 350 c1
7.0 1897.9 0.1 233.0 11.3 13.3 5.9 7.0 3 339 c4
13.0 1455.9 0.2 178.5 13.2 6.1 9.0 4.2 3 226 c5
0.0 1921.9 0.0 236.0 8.1 10.7 4.2 5.5 2 322 c6
0.0 1919.9 0.0 236.0 7.8 10.5 4.1 5.5 2 321 c7
So it''s about 1.3GB/s - about half of what I get with more pools.
Looks like a problem with scalability of one pool.
This message posted from opensolaris.org
Mario Goebbels
2007-Oct-19 16:37 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
> The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else?Remember that ZFS is checksumming everything on reads and writes. -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 648 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071019/3e21a16d/attachment.bin>
Robert Milkowski
2007-Oct-19 18:11 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
Hello Mario, Friday, October 19, 2007, 5:37:07 PM, you wrote:>> The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else?MG> Remember that ZFS is checksumming everything on reads and writes. I know - but if I divide that pool into 6 smaler ones (see my 2nd post) that the performance is much better. So it''s not that system is not able to cope because of checksums it''s rather that I can''t get that kind of a performance from single pool. I know it''s a "dd test" but anyway... -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Roch - PAE
2007-Oct-24 15:49 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
I would suspect the checksum part of this (I do believe it''s being actively worked on) : 6533726 single-threaded checksum & raidz2 parity calculations limit write bandwidth on thumper -r Robert Milkowski writes: > Hi, > > snv_74, x4500, 48x 500GB, 16GB RAM, 2x dual core > > # zpool create test c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 c0t6d0 c0t7d0 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c1t7d0 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t7d0 c5t1d0 c5t2d0 c5t3d0 c5t5d0 c5t6d0 c5t7d0 c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0 c6t6d0 c6t7d0 c7t0d0 c7t1d0 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t6d0 c7t7d0 > [46x 500GB] > > # ls -lh /test/q1 > -rw-r--r-- 1 root root 82G Oct 18 09:43 /test/q1 > > # dd if=/test/q1 of=/dev/null bs=16384k & > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 213G 20.6T 645 120 80.1M 14.7M > test 213G 20.6T 9.26K 0 1.16G 0 > test 213G 20.6T 9.66K 0 1.21G 0 > test 213G 20.6T 9.41K 0 1.18G 0 > test 213G 20.6T 9.41K 0 1.18G 0 > test 213G 20.6T 7.45K 0 953M 0 > test 213G 20.6T 7.59K 0 971M 0 > test 213G 20.6T 7.41K 0 948M 0 > test 213G 20.6T 8.25K 0 1.03G 0 > test 213G 20.6T 9.17K 0 1.15G 0 > test 213G 20.6T 9.54K 0 1.19G 0 > test 213G 20.6T 9.89K 0 1.24G 0 > test 213G 20.6T 9.41K 0 1.18G 0 > test 213G 20.6T 9.31K 0 1.16G 0 > test 213G 20.6T 9.80K 0 1.22G 0 > test 213G 20.6T 8.72K 0 1.09G 0 > test 213G 20.6T 7.86K 0 1006M 0 > test 213G 20.6T 7.21K 0 923M 0 > test 213G 20.6T 7.62K 0 975M 0 > test 213G 20.6T 8.68K 0 1.08G 0 > test 213G 20.6T 9.81K 0 1.23G 0 > test 213G 20.6T 9.57K 0 1.20G 0 > > So it''s around 1GB/s. > > # dd if=/dev/zero of=/test/q10 bs=128k & > # zpool iostat 1 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > test 223G 20.6T 656 170 81.5M 20.8M > test 223G 20.6T 0 8.10K 0 1021M > test 223G 20.6T 0 7.94K 0 1001M > test 216G 20.6T 0 6.53K 0 812M > test 216G 20.6T 0 7.19K 0 906M > test 216G 20.6T 0 6.78K 0 854M > test 216G 20.6T 0 7.88K 0 993M > test 216G 20.6T 0 10.3K 0 1.27G > test 222G 20.6T 0 8.61K 0 1.04G > test 222G 20.6T 0 7.30K 0 919M > test 222G 20.6T 0 8.16K 0 1.00G > test 222G 20.6T 0 8.82K 0 1.09G > test 225G 20.6T 0 4.19K 0 511M > test 225G 20.6T 0 10.2K 0 1.26G > test 225G 20.6T 0 9.15K 0 1.13G > test 225G 20.6T 0 8.46K 0 1.04G > test 225G 20.6T 0 8.48K 0 1.04G > test 225G 20.6T 0 10.9K 0 1.33G > test 231G 20.6T 0 3 0 3.96K > test 231G 20.6T 0 0 0 0 > test 231G 20.6T 0 0 0 0 > test 231G 20.6T 0 9.02K 0 1.11G > test 231G 20.6T 0 12.2K 0 1.50G > test 231G 20.6T 0 9.14K 0 1.13G > test 231G 20.6T 0 10.3K 0 1.27G > test 231G 20.6T 0 9.08K 0 1.10G > test 237G 20.6T 0 0 0 0 > test 237G 20.6T 0 0 0 0 > test 237G 20.6T 0 6.03K 0 760M > test 237G 20.6T 0 9.18K 0 1.13G > test 237G 20.6T 0 8.40K 0 1.03G > test 237G 20.6T 0 8.45K 0 1.04G > test 237G 20.6T 0 11.1K 0 1.36G > > Well, writing could be faster than reading here... there''re gaps due to bug 6415647 I guess. > > > # zpool destroy test > > # metainit d100 1 46 c0t0d0s0 c0t1d0s0 c0t2d0s0 c0t3d0s0 c0t4d0s0 c0t5d0s0 c0t6d0s0 c0t7d0s0 c1t0d0s0 c1t1d0s0 c1t2d0s0 c1t3d0s0 c1t4d0s0 c1t5d0s0 c1t6d0s0 c1t7d0s0 c4t0d0s0 c4t1d0s0 c4t2d0s0 c4t3d0s0 c4t4d0s0 c4t5d0s0 c4t6d0s0 c4t7d0s0 c5t1d0s0 c5t2d0s0 c5t3d0s0 c5t5d0s0 c5t6d0s0 c5t7d0s0 c6t0d0s0 c6t1d0s0 c6t2d0s0 c6t3d0s0 c6t4d0s0 c6t5d0s0 c6t6d0s0 c6t7d0s0 c7t0d0s0 c7t1d0s0 c7t2d0s0 c7t3d0s0 c7t4d0s0 c7t5d0s0 c7t6d0s0 c7t7d0s0 -i 128k > d100: Concat/Stripe is setup > [46x 500GB] > > And I get not so good results - maximum 1GB of reading... hmmmm... > > maxphys is 56K - I thought it was increased some time ago on x86! > > Still no performance increase. > > # metainit d101 -r c0t0d0s0 c1t0d0s0 c4t0d0s0 c6t0d0s0 c7t0d0s0 -i 128k > # metainit d102 -r c0t1d0s0 c1t1d0s0 c5t1d0s0 c6t1d0s0 c7t1d0s0 -i 128k > # metainit d103 -r c0t2d0s0 c1t2d0s0 c5t2d0s0 c6t2d0s0 c7t2d0s0 -i 128k > # metainit d104 -r c0t4d0s0 c1t4d0s0 c4t4d0s0 c6t4d0s0 c7t4d0s0 -i 128k > # metainit d105 -r c0t3d0s0 c1t3d0s0 c4t3d0s0 c5t3d0s0 c6t3d0s0 c7t3d0s0 -i 128k > # metainit d106 -r c0t5d0s0 c1t5d0s0 c4t5d0s0 c5t5d0s0 c6t5d0s0 c7t5d0s0 -i 128k > # metainit d107 -r c0t6d0s0 c1t6d0s0 c4t6d0s0 c5t6d0s0 c6t6d0s0 c7t6d0s0 -i 128k > # metainit d108 -r c0t7d0s0 c1t7d0s0 c4t7d0s0 c5t7d0s0 c6t7d0s0 c7t7d0s0 -i 128k > > # iostat -xnzCM 1 | egrep "device| c[0-7]$" > [...] > extended device statistics > r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device > 0.0 362.0 0.0 362.0 0.0 7.0 0.0 19.3 0 698 c0 > 0.0 377.0 0.0 377.0 0.0 7.0 0.0 18.5 0 698 c1 > 0.0 320.0 0.0 320.0 0.0 6.0 0.0 18.7 0 598 c4 > 0.0 268.0 0.0 268.0 0.0 5.0 0.0 18.6 0 499 c5 > 0.0 372.0 0.0 372.0 0.0 7.0 0.0 18.8 0 698 c6 > 0.0 374.0 0.0 374.0 0.0 7.0 0.0 18.7 0 698 c7 > > Sometimes I get even more - around 2.3GB/s > > The question is - why I can''t get that kind of performance with single zfs pool (striping accross all te disks)? Concurrency problem or something else? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2007-Oct-29 17:42 UTC
[zfs-discuss] Sequential reading/writting from large stripe faster on SVM than ZFS?
Hello Roch,
Wednesday, October 24, 2007, 3:49:45 PM, you wrote:
RP> I would suspect the checksum part of this (I do believe it''s
being
RP> actively worked on) :
RP> 6533726 single-threaded checksum & raidz2 parity
RP> calculations limit write bandwidth on thumper
I guess it''s single threaded per pool - that''s why once I
created
multiple pool the performance was much better.
Thanks for info.
--
Best regards,
Robert mailto:rmilkowski at task.gda.pl
http://milek.blogspot.com