Hello zfs-discuss, I have an nfs server with zfs as a local file server. System is snv_39 on SPARC. There are 6 raid-z pools (p1-p6). The problem is that I do not see any heavy traffic on network interfaces nor using zpool iostat. However using just old iostat I can see MUCH more traffic going to local disks. The difference is someting like 10x - zpool iostat shows for example ~6MB/s of reads however iostat shows ~50MB/s. The question is who''s lying? As server is behaving not that good in regards to performance I suspect iostat is more accurate. Or maybe zpool iostat shows only ''application data'' being transferred while iostat shows ''real'' IOs to disks - would there be that big difference (checksums, what else?)??? On the other hand when I look at how much traffic is on network interfaces it''s much closer to what I see using zpool iostat. So maybe zfs introduces that much overhead after all and zpool iostats shows app data being tranfered. Clients mount resources using NFSv3 over TCP. nfsd is set to have 2048 threads - all are utilized most of the day. Below iostat and zpool iostat output - both run at the same time in different terminals. bash-3.00# iostat -xnzC 1 | egrep "devic| c4$" extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1095.5 1390.7 64252.7 13742.5 0.0 137.5 0.0 55.3 0 1051 c4 470.2 4394.0 28882.4 3462.0 0.0 748.7 0.0 153.9 0 5388 c4 893.7 3262.0 55206.1 3124.8 0.0 680.4 0.0 163.7 0 6391 c4 965.6 3043.7 61801.2 2727.4 0.0 358.0 0.0 89.3 0 5119 c4 1162.8 2422.9 73277.0 5953.1 0.0 506.9 0.0 141.4 0 5390 c4 1693.1 1292.4 98599.2 1806.1 0.0 538.3 0.0 180.3 0 5204 c4 1551.7 1343.3 99808.4 1142.5 0.0 899.3 0.0 310.6 0 6300 c4 624.2 4002.8 39899.0 3435.7 0.0 429.7 0.0 92.9 0 4048 c4 1017.1 2735.7 65866.1 5809.7 0.0 325.9 0.0 86.8 0 4425 c4 1038.9 2817.9 66914.1 4276.2 0.0 212.3 0.0 55.0 0 4241 c4 784.4 3410.0 48851.0 9078.4 0.0 349.9 0.0 83.4 0 4579 c4 732.3 3542.8 46408.7 8075.1 0.0 526.4 0.0 123.1 0 4075 c4 928.1 3108.3 54917.9 7490.8 0.0 811.8 0.0 201.1 0 5750 c4 931.0 2943.1 55627.1 10331.7 0.0 846.0 0.0 218.4 0 5795 c4 ^C bash-3.00# bash-3.00# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 79 72 1.15M 920K p2 738G 78.2G 64 92 1.06M 1.27M p3 733G 83.1G 61 98 1.12M 1.28M p4 665G 82.7G 5 11 51.8K 55.4K p5 704G 43.9G 80 61 1.09M 873K p6 697G 51.2G 73 67 1.04M 935K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 13 128 276K 767K p2 738G 78.2G 16 129 1.47M 704K p3 733G 83.1G 10 192 388K 683K p4 665G 82.7G 16 3 37.1K 5.24K p5 704G 43.9G 11 172 34.2K 617K p6 697G 51.2G 12 35 31.4K 140K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 5 87 27.2K 739K p2 738G 78.2G 15 93 39.6K 391K p3 733G 83.1G 15 151 51.5K 298K p4 665G 82.7G 73 27 1.07M 118K p5 704G 43.9G 41 62 1.85M 317K p6 697G 51.2G 16 152 75.8K 879K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 39 83 211K 505K p2 738G 78.2G 27 76 562K 396K p3 733G 83.1G 38 77 109K 276K p4 665G 82.7G 0 1 0 6.67K p5 704G 43.9G 30 78 83.4K 596K p6 697G 51.2G 29 85 110K 702K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 394 39 1018K 3.09M p2 738G 78.2G 12 157 29.0K 274K p3 733G 83.1G 2 109 12.8K 844K p4 665G 82.7G 3 4 14.3K 20.0K p5 704G 43.9G 32 44 85.2K 527K p6 697G 51.2G 62 47 3.93M 365K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 159 0 421K 3.81K p2 738G 78.2G 18 86 174K 407K p3 733G 83.1G 28 89 121K 230K p4 665G 82.7G 94 17 7.27M 43.3K p5 704G 43.9G 25 0 225K 0 p6 697G 51.2G 80 28 7.10M 810K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 287 4 736K 34.2K p2 738G 78.2G 2 81 8.06K 389K p3 733G 83.1G 9 57 19.0K 493K p4 665G 82.7G 62 17 5.38M 70.6K p5 704G 43.9G 28 18 315K 152K p6 697G 51.2G 70 3 7.26M 133K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 24 20 75.3K 239K p2 738G 78.2G 36 228 576K 477K p3 733G 83.1G 0 220 4.74K 662K p4 665G 82.7G 33 12 323K 35.1K p5 704G 43.9G 9 311 26.6K 1.28M p6 697G 51.2G 31 2 87.4K 11.4K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 28 1 148K 11.4K p2 738G 78.2G 47 109 1.42M 1.11M p3 733G 83.1G 24 243 73.3K 661K p4 665G 82.7G 0 0 4.28K 0 p5 704G 43.9G 32 234 95.6K 1.32M p6 697G 51.2G 66 24 177K 2.23M ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 12 6 29.4K 64.5K p2 738G 78.2G 27 98 80.1K 1.46M p3 733G 83.1G 21 71 171K 795K p4 665G 82.7G 58 5 2.77M 19.4K p5 704G 43.9G 23 92 61.1K 470K p6 697G 51.2G 209 9 561K 428K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 18 1 56.7K 125K p2 738G 78.2G 17 76 48.2K 4.76M p3 733G 83.1G 29 95 88.5K 1.07M p4 665G 82.7G 10 112 146K 197K p5 704G 43.9G 16 368 45.5K 815K p6 697G 51.2G 26 129 79.6K 844K ---------- ----- ----- ----- ----- ----- ----- p1 751G 64.6G 28 49 76.2K 4.49M p2 738G 78.2G 37 9 107K 209K p3 733G 83.1G 27 138 198K 938K p4 665G 82.7G 17 164 415K 258K p5 704G 43.9G 7 223 29.9K 930K p6 697G 51.2G 21 132 905K 458K ---------- ----- ----- ----- ----- ----- ----- ^C bash-3.00# Example full iostat output (all disks, iostat -xnz 1) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1032.2 2751.8 64054.4 7491.7 0.0 682.4 0.0 180.3 0 5182 c4 21.2 45.5 1158.6 214.8 0.0 3.6 0.0 54.0 0 84 c4t500000E0118AC370d0 28.3 38.4 1494.2 208.8 0.0 4.8 0.0 71.7 0 84 c4t500000E0118B0390d0 11.1 64.7 711.7 302.8 0.0 10.9 0.0 143.2 0 100 c4t500000E0118F1FD0d0 1.0 0.0 64.7 0.0 0.0 0.1 0.0 74.5 0 8 c4t500000E011C19D60d0 1.0 0.0 64.7 0.0 0.0 0.1 0.0 76.2 0 8 c4t500000E0118C3220d0 47.5 2.0 3105.7 35.4 0.0 6.2 0.0 125.6 0 71 c4t500000E011902FA0d0 8.1 64.7 517.6 296.7 0.0 9.9 0.0 135.8 0 100 c4t500000E0118F2190d0 0.0 62.7 0.0 45.5 0.0 6.4 0.0 102.9 0 81 c4t500000E0119091E0d0 54.6 3.0 3558.6 35.4 0.0 8.3 0.0 143.2 0 80 c4t500000E011903120d0 10.1 61.7 647.0 271.4 0.0 6.6 0.0 92.2 0 94 c4t500000E0118F2350d0 1.0 61.7 64.7 45.5 0.0 6.4 0.0 101.8 0 84 c4t500000E0119032A0d0 47.5 2.0 3170.4 35.4 0.0 6.6 0.0 133.8 0 70 c4t500000E011903260d0 2.0 64.7 129.4 48.0 0.0 6.5 0.0 97.9 0 91 c4t500000E011909320d0 0.0 41.4 0.0 35.9 0.0 4.5 0.0 108.5 0 55 c4t500000E011903300d0 2.0 63.7 129.4 47.5 0.0 6.5 0.0 99.5 0 87 c4t500000E011909300d0 2.0 43.5 129.4 36.4 0.0 4.6 0.0 102.2 0 62 c4t500000E011903340d0 45.5 2.0 3041.0 35.4 0.0 6.4 0.0 135.0 0 70 c4t500000E011903320d0 1.0 62.7 64.7 46.5 0.0 6.4 0.0 100.8 0 84 c4t500000E0119095A0d0 22.2 45.5 1223.3 214.8 0.0 3.6 0.0 52.6 0 86 c4t500000E01192B420d0 37.4 2.0 2523.4 35.4 0.0 4.5 0.0 115.0 0 59 c4t500000E01190E6D0d0 9.1 55.6 582.3 283.6 0.0 5.3 0.0 81.4 0 86 c4t500000E01190E6B0d0 57.6 2.0 3752.7 35.4 0.0 8.9 0.0 149.6 0 82 c4t500000E01190E750d0 43.5 2.0 2846.9 35.4 0.0 5.3 0.0 116.4 0 66 c4t500000E01190E7F0d0 56.6 2.0 3752.7 35.4 0.0 8.4 0.0 144.0 0 81 c4t500000E01190E730d0 29.3 44.5 1558.9 212.8 0.0 5.0 0.0 67.1 0 96 c4t500000E01192B540d0 9.1 65.7 582.3 313.9 0.0 11.2 0.0 149.4 0 100 c4t500000E0118EDB20d0 0.0 78.9 0.0 75.3 0.0 35.0 0.0 443.8 0 100 c4t500000E0119495A0d0 1.0 0.0 64.7 0.0 0.0 0.1 0.0 85.8 0 9 c4t500000E01194A6F0d0 28.3 46.5 1611.5 213.8 0.0 5.0 0.0 67.0 0 97 c4t500000E01194A610d0 10.1 63.7 711.7 300.3 0.0 8.5 0.0 114.9 0 100 c4t500000E0118EDCC0d0 12.1 61.7 776.4 300.8 0.0 11.0 0.0 149.7 0 100 c4t500000E0118EDCA0d0 0.0 78.9 0.0 76.3 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A750d0 0.0 78.9 0.0 76.3 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A710d0 0.0 78.9 0.0 79.4 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A730d0 0.0 78.9 0.0 75.8 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A810d0 25.3 42.5 1300.1 208.3 0.0 4.4 0.0 64.3 0 85 c4t500000E0118C3230d0 9.1 58.6 582.3 262.8 0.0 5.9 0.0 86.4 0 88 c4t500000E0118F2060d0 42.5 3.0 2782.2 35.4 0.0 5.7 0.0 125.7 0 66 c4t500000E011902FB0d0 43.5 2.0 2846.9 35.4 0.0 5.3 0.0 115.7 0 65 c4t500000E0119030D0d0 2.0 64.7 129.4 47.5 0.0 6.7 0.0 99.9 0 87 c4t500000E011903030d0 11.1 61.7 711.7 303.3 0.0 9.6 0.0 131.7 0 98 c4t500000E0118F21C0d0 7.1 59.6 452.9 270.9 0.0 5.2 0.0 78.4 0 90 c4t500000E0118F2180d0 43.5 2.0 2846.9 35.4 0.0 5.6 0.0 122.3 0 68 c4t500000E0119030F0d0 60.7 2.0 3946.8 35.4 0.0 10.1 0.0 160.7 0 85 c4t500000E0119031B0d0 1.0 41.4 64.7 33.9 0.0 4.6 0.0 108.7 0 56 c4t500000E011903190d0 3.0 58.6 194.1 43.5 0.0 6.4 0.0 104.1 0 80 c4t500000E0119032D0d0 1.0 62.7 64.7 46.5 0.0 6.5 0.0 102.7 0 83 c4t500000E011903350d0 3.0 59.6 194.1 44.0 0.0 6.5 0.0 103.6 0 82 c4t500000E011903370d0 27.3 37.4 1429.5 207.2 0.0 4.6 0.0 70.7 0 80 c4t500000E01192B150d0 1.0 0.0 64.7 0.0 0.0 0.1 0.0 80.8 0 8 c4t500000E01192B2F0d0 1.0 0.0 64.7 0.0 0.0 0.1 0.0 83.2 0 8 c4t500000E0118ABA70d0 2.0 0.0 129.4 0.0 0.0 0.2 0.0 96.4 0 19 c4t500000E01192B390d0 30.3 44.5 1623.6 212.8 0.0 4.9 0.0 65.8 0 96 c4t500000E01192B3B0d0 28.3 41.4 1494.2 211.3 0.0 4.7 0.0 67.7 0 86 c4t500000E0118ABC50d0 2.0 0.0 129.4 0.0 0.0 0.2 0.0 94.1 0 19 c4t500000E0118B7B10d0 25.3 37.4 1417.4 209.3 0.0 4.0 0.0 63.4 0 80 c4t500000E0119494D0d0 8.1 53.6 517.6 264.4 0.0 4.5 0.0 72.8 0 82 c4t500000E0118EDA10d0 0.0 77.8 0.0 78.3 0.0 35.0 0.0 449.6 0 100 c4t500000E011949570d0 22.2 37.4 1223.3 209.3 0.0 3.5 0.0 59.0 0 75 c4t500000E01194A620d0 0.0 77.8 0.0 77.8 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A660d0 0.0 77.8 0.0 76.3 0.0 35.0 0.0 449.6 0 100 c4t500000E011949630d0 28.3 44.5 1611.5 211.8 0.0 4.4 0.0 60.3 0 95 c4t500000E01194A740d0 0.0 77.8 0.0 74.3 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A780d0 0.0 77.8 0.0 76.3 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A720d0 1.0 0.0 64.7 0.0 0.0 0.1 0.0 77.5 0 8 c4t500000E01194A760d0 2.0 0.0 129.4 0.0 0.0 0.2 0.0 91.1 0 18 c4t500000E01194A8A0d0 0.0 77.8 0.0 74.3 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A8C0d0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0 100 c4t500000E01194A840d0 ^C bash-3.00# All pools have atime set to off, and sharenfs is set. Other than that rest parameters are default. bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT p1 752G 51.6G 53K /p1 p1/d5201 383G 17.0G 383G /p1/d5201 p1/d5202 368G 31.5G 368G /p1/d5202 p2 738G 65.4G 53K /p2 p2/d5203 376G 24.2G 376G /p2/d5203 p2/d5204 362G 38.1G 362G /p2/d5204 p3 733G 70.3G 53K /p3 p3/d5205 366G 33.8G 366G /p3/d5205 p3/d5206 367G 33.4G 367G /p3/d5206 p4 665G 71.1G 53K /p4 p4/d5207 328G 71.1G 328G /p4/d5207 p4/d5208 337G 62.9G 337G /p4/d5208 p5 704G 32.2G 53K /p5 p5/d5209 310G 32.2G 310G /p5/d5209 p5/d5210 393G 6.52G 393G /p5/d5210 p6 697G 39.5G 53K /p6 p6/d5211 394G 5.76G 394G /p6/d5211 p6/d5212 302G 39.5G 302G /p6/d5212 bash-3.00# bash-3.00# zpool status pool: p1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM p1 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t500000E011909320d0 ONLINE 0 0 0 c4t500000E011909300d0 ONLINE 0 0 0 c4t500000E011903030d0 ONLINE 0 0 0 c4t500000E011903300d0 ONLINE 0 0 0 c4t500000E0119091E0d0 ONLINE 0 0 0 c4t500000E0119032D0d0 ONLINE 0 0 0 c4t500000E011903370d0 ONLINE 0 0 0 c4t500000E011903190d0 ONLINE 0 0 0 c4t500000E011903350d0 ONLINE 0 0 0 c4t500000E0119095A0d0 ONLINE 0 0 0 c4t500000E0119032A0d0 ONLINE 0 0 0 c4t500000E011903340d0 ONLINE 0 0 0 errors: No known data errors pool: p2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM p2 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t500000E011902FB0d0 ONLINE 0 0 0 c4t500000E0119030F0d0 ONLINE 0 0 0 c4t500000E01190E730d0 ONLINE 0 0 0 c4t500000E01190E7F0d0 ONLINE 0 0 0 c4t500000E011903120d0 ONLINE 0 0 0 c4t500000E01190E750d0 ONLINE 0 0 0 c4t500000E0119031B0d0 ONLINE 0 0 0 c4t500000E0119030D0d0 ONLINE 0 0 0 c4t500000E011903260d0 ONLINE 0 0 0 c4t500000E011903320d0 ONLINE 0 0 0 c4t500000E011902FA0d0 ONLINE 0 0 0 c4t500000E01190E6D0d0 ONLINE 0 0 0 errors: No known data errors pool: p3 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM p3 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t500000E01194A620d0 ONLINE 0 0 0 c4t500000E0119494D0d0 ONLINE 0 0 0 c4t500000E0118ABC50d0 ONLINE 0 0 0 c4t500000E0118B0390d0 ONLINE 0 0 0 c4t500000E01194A610d0 ONLINE 0 0 0 c4t500000E01194A740d0 ONLINE 0 0 0 c4t500000E01192B3B0d0 ONLINE 0 0 0 c4t500000E0118C3230d0 ONLINE 0 0 0 c4t500000E0118AC370d0 ONLINE 0 0 0 c4t500000E01192B420d0 ONLINE 0 0 0 c4t500000E01192B540d0 ONLINE 0 0 0 c4t500000E01192B150d0 ONLINE 0 0 0 errors: No known data errors pool: p4 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM p4 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t500000E01192B2F0d0 ONLINE 0 0 0 c4t500000E01194A760d0 ONLINE 0 0 0 c4t500000E01192B290d0 ONLINE 0 0 0 c4t500000E011C19D60d0 ONLINE 0 0 0 c4t500000E0118C3220d0 ONLINE 0 0 0 c4t500000E0118ABA70d0 ONLINE 0 0 0 c4t500000E01194A6F0d0 ONLINE 0 0 0 c4t500000E01192B390d0 ONLINE 0 0 0 c4t500000E01194A840d0 ONLINE 0 0 0 c4t500000E0118B7B10d0 ONLINE 0 0 0 c4t500000E01194A8A0d0 ONLINE 0 0 0 errors: No known data errors pool: p5 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM p5 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t500000E0118EDCC0d0 ONLINE 0 0 0 c4t500000E0118EDCA0d0 ONLINE 0 0 0 c4t500000E0118F2060d0 ONLINE 0 0 0 c4t500000E0118F2350d0 ONLINE 0 0 0 c4t500000E0118F2180d0 ONLINE 0 0 0 c4t500000E0118F2190d0 ONLINE 0 0 0 c4t500000E0118EDB20d0 ONLINE 0 0 0 c4t500000E0118EDA10d0 ONLINE 0 0 0 c4t500000E01190E6B0d0 ONLINE 0 0 0 c4t500000E0118F21C0d0 ONLINE 0 0 0 c4t500000E0118F1FD0d0 ONLINE 0 0 0 errors: No known data errors pool: p6 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM p6 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t500000E01194A810d0 ONLINE 0 0 0 c4t500000E01194A780d0 ONLINE 0 0 0 c4t500000E01194A710d0 ONLINE 0 0 0 c4t500000E011949630d0 ONLINE 0 0 0 c4t500000E01194A730d0 ONLINE 0 0 0 c4t500000E01194A660d0 ONLINE 0 0 0 c4t500000E0119495A0d0 ONLINE 0 0 0 c4t500000E01194A720d0 ONLINE 0 0 0 c4t500000E01194A750d0 ONLINE 0 0 0 c4t500000E01194A8C0d0 ONLINE 0 0 0 c4t500000E011949570d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Robert, Wednesday, May 31, 2006, 12:22:34 AM, you wrote: RM> Hello zfs-discuss, RM> I have an nfs server with zfs as a local file server. RM> System is snv_39 on SPARC. RM> There are 6 raid-z pools (p1-p6). RM> The problem is that I do not see any heavy traffic on network RM> interfaces nor using zpool iostat. However using just old iostat I can RM> see MUCH more traffic going to local disks. The difference is someting RM> like 10x - zpool iostat shows for example ~6MB/s of reads however RM> iostat shows ~50MB/s. The question is who''s lying? RM> As server is behaving not that good in regards to performance I RM> suspect iostat is more accurate. Or maybe zpool iostat shows only RM> ''application data'' being transferred while iostat shows ''real'' IOs to RM> disks - would there be that big difference (checksums, what else?)??? RM> On the other hand when I look at how much traffic is on network RM> interfaces it''s much closer to what I see using zpool iostat. RM> So maybe zfs introduces that much overhead after all and zpool iostats RM> shows app data being tranfered. RM> Clients mount resources using NFSv3 over TCP. RM> nfsd is set to have 2048 threads - all are utilized most of the day. RM> Below iostat and zpool iostat output - both run at the same time in RM> different terminals. RM> bash-3.00# iostat -xnzC 1 | egrep "devic| c4$" RM> extended device statistics RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device RM> 1095.5 1390.7 64252.7 13742.5 0.0 137.5 0.0 55.3 0 1051 c4 RM> 470.2 4394.0 28882.4 3462.0 0.0 748.7 0.0 153.9 0 5388 c4 RM> 893.7 3262.0 55206.1 3124.8 0.0 680.4 0.0 163.7 0 6391 c4 RM> 965.6 3043.7 61801.2 2727.4 0.0 358.0 0.0 89.3 0 5119 c4 RM> 1162.8 2422.9 73277.0 5953.1 0.0 506.9 0.0 141.4 0 5390 c4 RM> 1693.1 1292.4 98599.2 1806.1 0.0 538.3 0.0 180.3 0 5204 c4 RM> 1551.7 1343.3 99808.4 1142.5 0.0 899.3 0.0 310.6 0 6300 c4 RM> 624.2 4002.8 39899.0 3435.7 0.0 429.7 0.0 92.9 0 4048 c4 RM> 1017.1 2735.7 65866.1 5809.7 0.0 325.9 0.0 86.8 0 4425 c4 RM> 1038.9 2817.9 66914.1 4276.2 0.0 212.3 0.0 55.0 0 4241 c4 RM> 784.4 3410.0 48851.0 9078.4 0.0 349.9 0.0 83.4 0 4579 c4 RM> 732.3 3542.8 46408.7 8075.1 0.0 526.4 0.0 123.1 0 4075 c4 RM> 928.1 3108.3 54917.9 7490.8 0.0 811.8 0.0 201.1 0 5750 c4 RM> 931.0 2943.1 55627.1 10331.7 0.0 846.0 0.0 218.4 0 5795 c4 RM> ^C RM> bash-3.00# RM> bash-3.00# zpool iostat 1 RM> capacity operations bandwidth RM> pool used avail read write read write RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 79 72 1.15M 920K RM> p2 738G 78.2G 64 92 1.06M 1.27M RM> p3 733G 83.1G 61 98 1.12M 1.28M RM> p4 665G 82.7G 5 11 51.8K 55.4K RM> p5 704G 43.9G 80 61 1.09M 873K RM> p6 697G 51.2G 73 67 1.04M 935K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 13 128 276K 767K RM> p2 738G 78.2G 16 129 1.47M 704K RM> p3 733G 83.1G 10 192 388K 683K RM> p4 665G 82.7G 16 3 37.1K 5.24K RM> p5 704G 43.9G 11 172 34.2K 617K RM> p6 697G 51.2G 12 35 31.4K 140K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 5 87 27.2K 739K RM> p2 738G 78.2G 15 93 39.6K 391K RM> p3 733G 83.1G 15 151 51.5K 298K RM> p4 665G 82.7G 73 27 1.07M 118K RM> p5 704G 43.9G 41 62 1.85M 317K RM> p6 697G 51.2G 16 152 75.8K 879K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 39 83 211K 505K RM> p2 738G 78.2G 27 76 562K 396K RM> p3 733G 83.1G 38 77 109K 276K RM> p4 665G 82.7G 0 1 0 6.67K RM> p5 704G 43.9G 30 78 83.4K 596K RM> p6 697G 51.2G 29 85 110K 702K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 394 39 1018K 3.09M RM> p2 738G 78.2G 12 157 29.0K 274K RM> p3 733G 83.1G 2 109 12.8K 844K RM> p4 665G 82.7G 3 4 14.3K 20.0K RM> p5 704G 43.9G 32 44 85.2K 527K RM> p6 697G 51.2G 62 47 3.93M 365K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 159 0 421K 3.81K RM> p2 738G 78.2G 18 86 174K 407K RM> p3 733G 83.1G 28 89 121K 230K RM> p4 665G 82.7G 94 17 7.27M 43.3K RM> p5 704G 43.9G 25 0 225K 0 RM> p6 697G 51.2G 80 28 7.10M 810K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 287 4 736K 34.2K RM> p2 738G 78.2G 2 81 8.06K 389K RM> p3 733G 83.1G 9 57 19.0K 493K RM> p4 665G 82.7G 62 17 5.38M 70.6K RM> p5 704G 43.9G 28 18 315K 152K RM> p6 697G 51.2G 70 3 7.26M 133K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 24 20 75.3K 239K RM> p2 738G 78.2G 36 228 576K 477K RM> p3 733G 83.1G 0 220 4.74K 662K RM> p4 665G 82.7G 33 12 323K 35.1K RM> p5 704G 43.9G 9 311 26.6K 1.28M RM> p6 697G 51.2G 31 2 87.4K 11.4K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 28 1 148K 11.4K RM> p2 738G 78.2G 47 109 1.42M 1.11M RM> p3 733G 83.1G 24 243 73.3K 661K RM> p4 665G 82.7G 0 0 4.28K 0 RM> p5 704G 43.9G 32 234 95.6K 1.32M RM> p6 697G 51.2G 66 24 177K 2.23M RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 12 6 29.4K 64.5K RM> p2 738G 78.2G 27 98 80.1K 1.46M RM> p3 733G 83.1G 21 71 171K 795K RM> p4 665G 82.7G 58 5 2.77M 19.4K RM> p5 704G 43.9G 23 92 61.1K 470K RM> p6 697G 51.2G 209 9 561K 428K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 18 1 56.7K 125K RM> p2 738G 78.2G 17 76 48.2K 4.76M RM> p3 733G 83.1G 29 95 88.5K 1.07M RM> p4 665G 82.7G 10 112 146K 197K RM> p5 704G 43.9G 16 368 45.5K 815K RM> p6 697G 51.2G 26 129 79.6K 844K RM> ---------- ----- ----- ----- ----- ----- ----- RM> p1 751G 64.6G 28 49 76.2K 4.49M RM> p2 738G 78.2G 37 9 107K 209K RM> p3 733G 83.1G 27 138 198K 938K RM> p4 665G 82.7G 17 164 415K 258K RM> p5 704G 43.9G 7 223 29.9K 930K RM> p6 697G 51.2G 21 132 905K 458K RM> ---------- ----- ----- ----- ----- ----- ----- RM> ^C RM> bash-3.00# RM> Example full iostat output (all disks, iostat -xnz 1) RM> extended device statistics RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device RM> 1032.2 2751.8 64054.4 7491.7 0.0 682.4 0.0 180.3 0 5182 c4 RM> 21.2 45.5 1158.6 214.8 0.0 3.6 0.0 54.0 0 84 c4t500000E0118AC370d0 RM> 28.3 38.4 1494.2 208.8 0.0 4.8 0.0 71.7 0 84 c4t500000E0118B0390d0 RM> 11.1 64.7 711.7 302.8 0.0 10.9 0.0 143.2 0 100 c4t500000E0118F1FD0d0 RM> 1.0 0.0 64.7 0.0 0.0 0.1 0.0 74.5 0 8 c4t500000E011C19D60d0 RM> 1.0 0.0 64.7 0.0 0.0 0.1 0.0 76.2 0 8 c4t500000E0118C3220d0 RM> 47.5 2.0 3105.7 35.4 0.0 6.2 0.0 125.6 0 71 c4t500000E011902FA0d0 RM> 8.1 64.7 517.6 296.7 0.0 9.9 0.0 135.8 0 100 c4t500000E0118F2190d0 RM> 0.0 62.7 0.0 45.5 0.0 6.4 0.0 102.9 0 81 c4t500000E0119091E0d0 RM> 54.6 3.0 3558.6 35.4 0.0 8.3 0.0 143.2 0 80 c4t500000E011903120d0 RM> 10.1 61.7 647.0 271.4 0.0 6.6 0.0 92.2 0 94 c4t500000E0118F2350d0 RM> 1.0 61.7 64.7 45.5 0.0 6.4 0.0 101.8 0 84 c4t500000E0119032A0d0 RM> 47.5 2.0 3170.4 35.4 0.0 6.6 0.0 133.8 0 70 c4t500000E011903260d0 RM> 2.0 64.7 129.4 48.0 0.0 6.5 0.0 97.9 0 91 c4t500000E011909320d0 RM> 0.0 41.4 0.0 35.9 0.0 4.5 0.0 108.5 0 55 c4t500000E011903300d0 RM> 2.0 63.7 129.4 47.5 0.0 6.5 0.0 99.5 0 87 c4t500000E011909300d0 RM> 2.0 43.5 129.4 36.4 0.0 4.6 0.0 102.2 0 62 c4t500000E011903340d0 RM> 45.5 2.0 3041.0 35.4 0.0 6.4 0.0 135.0 0 70 c4t500000E011903320d0 RM> 1.0 62.7 64.7 46.5 0.0 6.4 0.0 100.8 0 84 c4t500000E0119095A0d0 RM> 22.2 45.5 1223.3 214.8 0.0 3.6 0.0 52.6 0 86 c4t500000E01192B420d0 RM> 37.4 2.0 2523.4 35.4 0.0 4.5 0.0 115.0 0 59 c4t500000E01190E6D0d0 RM> 9.1 55.6 582.3 283.6 0.0 5.3 0.0 81.4 0 86 c4t500000E01190E6B0d0 RM> 57.6 2.0 3752.7 35.4 0.0 8.9 0.0 149.6 0 82 c4t500000E01190E750d0 RM> 43.5 2.0 2846.9 35.4 0.0 5.3 0.0 116.4 0 66 c4t500000E01190E7F0d0 RM> 56.6 2.0 3752.7 35.4 0.0 8.4 0.0 144.0 0 81 c4t500000E01190E730d0 RM> 29.3 44.5 1558.9 212.8 0.0 5.0 0.0 67.1 0 96 c4t500000E01192B540d0 RM> 9.1 65.7 582.3 313.9 0.0 11.2 0.0 149.4 0 100 c4t500000E0118EDB20d0 RM> 0.0 78.9 0.0 75.3 0.0 35.0 0.0 443.8 0 100 c4t500000E0119495A0d0 RM> 1.0 0.0 64.7 0.0 0.0 0.1 0.0 85.8 0 9 c4t500000E01194A6F0d0 RM> 28.3 46.5 1611.5 213.8 0.0 5.0 0.0 67.0 0 97 c4t500000E01194A610d0 RM> 10.1 63.7 711.7 300.3 0.0 8.5 0.0 114.9 0 100 c4t500000E0118EDCC0d0 RM> 12.1 61.7 776.4 300.8 0.0 11.0 0.0 149.7 0 100 c4t500000E0118EDCA0d0 RM> 0.0 78.9 0.0 76.3 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A750d0 RM> 0.0 78.9 0.0 76.3 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A710d0 RM> 0.0 78.9 0.0 79.4 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A730d0 RM> 0.0 78.9 0.0 75.8 0.0 35.0 0.0 443.8 0 100 c4t500000E01194A810d0 RM> 25.3 42.5 1300.1 208.3 0.0 4.4 0.0 64.3 0 85 c4t500000E0118C3230d0 RM> 9.1 58.6 582.3 262.8 0.0 5.9 0.0 86.4 0 88 c4t500000E0118F2060d0 RM> 42.5 3.0 2782.2 35.4 0.0 5.7 0.0 125.7 0 66 c4t500000E011902FB0d0 RM> 43.5 2.0 2846.9 35.4 0.0 5.3 0.0 115.7 0 65 c4t500000E0119030D0d0 RM> 2.0 64.7 129.4 47.5 0.0 6.7 0.0 99.9 0 87 c4t500000E011903030d0 RM> 11.1 61.7 711.7 303.3 0.0 9.6 0.0 131.7 0 98 c4t500000E0118F21C0d0 RM> 7.1 59.6 452.9 270.9 0.0 5.2 0.0 78.4 0 90 c4t500000E0118F2180d0 RM> 43.5 2.0 2846.9 35.4 0.0 5.6 0.0 122.3 0 68 c4t500000E0119030F0d0 RM> 60.7 2.0 3946.8 35.4 0.0 10.1 0.0 160.7 0 85 c4t500000E0119031B0d0 RM> 1.0 41.4 64.7 33.9 0.0 4.6 0.0 108.7 0 56 c4t500000E011903190d0 RM> 3.0 58.6 194.1 43.5 0.0 6.4 0.0 104.1 0 80 c4t500000E0119032D0d0 RM> 1.0 62.7 64.7 46.5 0.0 6.5 0.0 102.7 0 83 c4t500000E011903350d0 RM> 3.0 59.6 194.1 44.0 0.0 6.5 0.0 103.6 0 82 c4t500000E011903370d0 RM> 27.3 37.4 1429.5 207.2 0.0 4.6 0.0 70.7 0 80 c4t500000E01192B150d0 RM> 1.0 0.0 64.7 0.0 0.0 0.1 0.0 80.8 0 8 c4t500000E01192B2F0d0 RM> 1.0 0.0 64.7 0.0 0.0 0.1 0.0 83.2 0 8 c4t500000E0118ABA70d0 RM> 2.0 0.0 129.4 0.0 0.0 0.2 0.0 96.4 0 19 c4t500000E01192B390d0 RM> 30.3 44.5 1623.6 212.8 0.0 4.9 0.0 65.8 0 96 c4t500000E01192B3B0d0 RM> 28.3 41.4 1494.2 211.3 0.0 4.7 0.0 67.7 0 86 c4t500000E0118ABC50d0 RM> 2.0 0.0 129.4 0.0 0.0 0.2 0.0 94.1 0 19 c4t500000E0118B7B10d0 RM> 25.3 37.4 1417.4 209.3 0.0 4.0 0.0 63.4 0 80 c4t500000E0119494D0d0 RM> 8.1 53.6 517.6 264.4 0.0 4.5 0.0 72.8 0 82 c4t500000E0118EDA10d0 RM> 0.0 77.8 0.0 78.3 0.0 35.0 0.0 449.6 0 100 c4t500000E011949570d0 RM> 22.2 37.4 1223.3 209.3 0.0 3.5 0.0 59.0 0 75 c4t500000E01194A620d0 RM> 0.0 77.8 0.0 77.8 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A660d0 RM> 0.0 77.8 0.0 76.3 0.0 35.0 0.0 449.6 0 100 c4t500000E011949630d0 RM> 28.3 44.5 1611.5 211.8 0.0 4.4 0.0 60.3 0 95 c4t500000E01194A740d0 RM> 0.0 77.8 0.0 74.3 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A780d0 RM> 0.0 77.8 0.0 76.3 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A720d0 RM> 1.0 0.0 64.7 0.0 0.0 0.1 0.0 77.5 0 8 c4t500000E01194A760d0 RM> 2.0 0.0 129.4 0.0 0.0 0.2 0.0 91.1 0 18 c4t500000E01194A8A0d0 RM> 0.0 77.8 0.0 74.3 0.0 35.0 0.0 449.6 0 100 c4t500000E01194A8C0d0 RM> 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0 100 c4t500000E01194A840d0 RM> ^C RM> bash-3.00# RM> All pools have atime set to off, and sharenfs is set. RM> Other than that rest parameters are default. RM> bash-3.00# zfs list RM> NAME USED AVAIL REFER MOUNTPOINT RM> p1 752G 51.6G 53K /p1 RM> p1/d5201 383G 17.0G 383G /p1/d5201 RM> p1/d5202 368G 31.5G 368G /p1/d5202 RM> p2 738G 65.4G 53K /p2 RM> p2/d5203 376G 24.2G 376G /p2/d5203 RM> p2/d5204 362G 38.1G 362G /p2/d5204 RM> p3 733G 70.3G 53K /p3 RM> p3/d5205 366G 33.8G 366G /p3/d5205 RM> p3/d5206 367G 33.4G 367G /p3/d5206 RM> p4 665G 71.1G 53K /p4 RM> p4/d5207 328G 71.1G 328G /p4/d5207 RM> p4/d5208 337G 62.9G 337G /p4/d5208 RM> p5 704G 32.2G 53K /p5 RM> p5/d5209 310G 32.2G 310G /p5/d5209 RM> p5/d5210 393G 6.52G 393G /p5/d5210 RM> p6 697G 39.5G 53K /p6 RM> p6/d5211 394G 5.76G 394G /p6/d5211 RM> p6/d5212 302G 39.5G 302G /p6/d5212 RM> bash-3.00# RM> bash-3.00# zpool status RM> pool: p1 RM> state: ONLINE RM> scrub: none requested RM> config: RM> NAME STATE READ WRITE CKSUM RM> p1 ONLINE 0 0 0 RM> raidz ONLINE 0 0 0 RM> c4t500000E011909320d0 ONLINE 0 0 0 RM> c4t500000E011909300d0 ONLINE 0 0 0 RM> c4t500000E011903030d0 ONLINE 0 0 0 RM> c4t500000E011903300d0 ONLINE 0 0 0 RM> c4t500000E0119091E0d0 ONLINE 0 0 0 RM> c4t500000E0119032D0d0 ONLINE 0 0 0 RM> c4t500000E011903370d0 ONLINE 0 0 0 RM> c4t500000E011903190d0 ONLINE 0 0 0 RM> c4t500000E011903350d0 ONLINE 0 0 0 RM> c4t500000E0119095A0d0 ONLINE 0 0 0 RM> c4t500000E0119032A0d0 ONLINE 0 0 0 RM> c4t500000E011903340d0 ONLINE 0 0 0 RM> errors: No known data errors RM> pool: p2 RM> state: ONLINE RM> scrub: none requested RM> config: RM> NAME STATE READ WRITE CKSUM RM> p2 ONLINE 0 0 0 RM> raidz ONLINE 0 0 0 RM> c4t500000E011902FB0d0 ONLINE 0 0 0 RM> c4t500000E0119030F0d0 ONLINE 0 0 0 RM> c4t500000E01190E730d0 ONLINE 0 0 0 RM> c4t500000E01190E7F0d0 ONLINE 0 0 0 RM> c4t500000E011903120d0 ONLINE 0 0 0 RM> c4t500000E01190E750d0 ONLINE 0 0 0 RM> c4t500000E0119031B0d0 ONLINE 0 0 0 RM> c4t500000E0119030D0d0 ONLINE 0 0 0 RM> c4t500000E011903260d0 ONLINE 0 0 0 RM> c4t500000E011903320d0 ONLINE 0 0 0 RM> c4t500000E011902FA0d0 ONLINE 0 0 0 RM> c4t500000E01190E6D0d0 ONLINE 0 0 0 RM> errors: No known data errors RM> pool: p3 RM> state: ONLINE RM> scrub: none requested RM> config: RM> NAME STATE READ WRITE CKSUM RM> p3 ONLINE 0 0 0 RM> raidz ONLINE 0 0 0 RM> c4t500000E01194A620d0 ONLINE 0 0 0 RM> c4t500000E0119494D0d0 ONLINE 0 0 0 RM> c4t500000E0118ABC50d0 ONLINE 0 0 0 RM> c4t500000E0118B0390d0 ONLINE 0 0 0 RM> c4t500000E01194A610d0 ONLINE 0 0 0 RM> c4t500000E01194A740d0 ONLINE 0 0 0 RM> c4t500000E01192B3B0d0 ONLINE 0 0 0 RM> c4t500000E0118C3230d0 ONLINE 0 0 0 RM> c4t500000E0118AC370d0 ONLINE 0 0 0 RM> c4t500000E01192B420d0 ONLINE 0 0 0 RM> c4t500000E01192B540d0 ONLINE 0 0 0 RM> c4t500000E01192B150d0 ONLINE 0 0 0 RM> errors: No known data errors RM> pool: p4 RM> state: ONLINE RM> scrub: none requested RM> config: RM> NAME STATE READ WRITE CKSUM RM> p4 ONLINE 0 0 0 RM> raidz ONLINE 0 0 0 RM> c4t500000E01192B2F0d0 ONLINE 0 0 0 RM> c4t500000E01194A760d0 ONLINE 0 0 0 RM> c4t500000E01192B290d0 ONLINE 0 0 0 RM> c4t500000E011C19D60d0 ONLINE 0 0 0 RM> c4t500000E0118C3220d0 ONLINE 0 0 0 RM> c4t500000E0118ABA70d0 ONLINE 0 0 0 RM> c4t500000E01194A6F0d0 ONLINE 0 0 0 RM> c4t500000E01192B390d0 ONLINE 0 0 0 RM> c4t500000E01194A840d0 ONLINE 0 0 0 RM> c4t500000E0118B7B10d0 ONLINE 0 0 0 RM> c4t500000E01194A8A0d0 ONLINE 0 0 0 RM> errors: No known data errors RM> pool: p5 RM> state: ONLINE RM> scrub: none requested RM> config: RM> NAME STATE READ WRITE CKSUM RM> p5 ONLINE 0 0 0 RM> raidz ONLINE 0 0 0 RM> c4t500000E0118EDCC0d0 ONLINE 0 0 0 RM> c4t500000E0118EDCA0d0 ONLINE 0 0 0 RM> c4t500000E0118F2060d0 ONLINE 0 0 0 RM> c4t500000E0118F2350d0 ONLINE 0 0 0 RM> c4t500000E0118F2180d0 ONLINE 0 0 0 RM> c4t500000E0118F2190d0 ONLINE 0 0 0 RM> c4t500000E0118EDB20d0 ONLINE 0 0 0 RM> c4t500000E0118EDA10d0 ONLINE 0 0 0 RM> c4t500000E01190E6B0d0 ONLINE 0 0 0 RM> c4t500000E0118F21C0d0 ONLINE 0 0 0 RM> c4t500000E0118F1FD0d0 ONLINE 0 0 0 RM> errors: No known data errors RM> pool: p6 RM> state: ONLINE RM> scrub: none requested RM> config: RM> NAME STATE READ WRITE CKSUM RM> p6 ONLINE 0 0 0 RM> raidz ONLINE 0 0 0 RM> c4t500000E01194A810d0 ONLINE 0 0 0 RM> c4t500000E01194A780d0 ONLINE 0 0 0 RM> c4t500000E01194A710d0 ONLINE 0 0 0 RM> c4t500000E011949630d0 ONLINE 0 0 0 RM> c4t500000E01194A730d0 ONLINE 0 0 0 RM> c4t500000E01194A660d0 ONLINE 0 0 0 RM> c4t500000E0119495A0d0 ONLINE 0 0 0 RM> c4t500000E01194A720d0 ONLINE 0 0 0 RM> c4t500000E01194A750d0 ONLINE 0 0 0 RM> c4t500000E01194A8C0d0 ONLINE 0 0 0 RM> c4t500000E011949570d0 ONLINE 0 0 0 RM> errors: No known data errors RM> bash-3.00# bash-3.00# fsstat zfs 1 [...] new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 10 12 8 919 7 102 0 32 975K 26 652K zfs 6 21 10 1.22K 1 123 0 205 6.23M 4 33.5K zfs 14 26 3 1.14K 9 127 0 46 1.33M 5 60.1K zfs 13 11 8 1.02K 7 102 0 43 1.24M 22 514K zfs 10 17 10 998 6 87 0 31 746K 85 2.45M zfs 11 15 3 915 24 93 0 60 1.86M 6 54.3K zfs 7 31 19 1.82K 5 167 0 23 636K 278 8.22M zfs 14 22 13 1.44K 10 104 0 31 992K 257 7.84M zfs 5 18 5 1.16K 4 80 0 26 764K 262 8.06M zfs 1 19 6 572 2 75 0 19 579K 3 20.6K zfs ^C and iostat and the same time: bash-3.00# iostat -xnzC 1|egrep "devic| c4$" extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1094.7 1395.7 64212.0 13725.1 0.1 138.5 0.0 55.6 0 1060 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 707.5 3585.5 44721.8 8486.7 0.0 583.7 0.0 136.0 0 5787 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 871.1 3186.4 55818.3 3175.2 0.0 944.1 0.0 232.7 0 6533 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1117.7 2557.8 72886.6 2516.0 0.0 748.7 0.0 203.7 0 6290 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 843.8 3277.9 54130.4 4772.4 0.0 723.7 0.0 175.6 0 6532 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1285.1 2339.0 76506.6 2233.9 0.0 771.5 0.0 212.9 0 6626 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 849.8 3340.9 54389.5 3133.2 0.0 513.4 0.0 122.5 0 6212 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 834.2 3358.8 53391.5 4774.1 0.0 640.4 0.0 152.7 0 6216 c4 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 905.4 3115.2 59024.8 3698.9 0.0 588.2 0.0 146.3 0 5078 c4 -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
That''s interesting - ''zpool iostat'' shows quite small read volume to any pool however if I run ''zpool iostat -v'' then I can see that while read volume to a pool is small, read volume to each disk is actually quite large so in summary I get over 10x read volume if I sum all disks in a pool than on pool itself. These data are consistent with iostat. So now even zpool claims that it actually issues 10x (and more) read volume to all disks in a pool than to pool itself. Now - why???? It really hits performance here... bash-3.00# zpool iostat -v p1 1 capacity operations bandwidth pool used avail read write read write ------------------------- ----- ----- ----- ----- ----- ----- p1 749G 67.2G 58 90 878K 903K raidz 749G 67.2G 58 90 878K 903K c4t500000E011909320d0 - - 15 40 959K 87.3K c4t500000E011909300d0 - - 14 40 929K 86.5K c4t500000E011903030d0 - - 18 40 1.11M 86.8K c4t500000E011903300d0 - - 13 32 823K 77.7K c4t500000E0119091E0d0 - - 15 40 961K 87.3K c4t500000E0119032D0d0 - - 14 40 930K 86.5K c4t500000E011903370d0 - - 18 40 1.11M 86.8K c4t500000E011903190d0 - - 13 32 828K 77.8K c4t500000E011903350d0 - - 15 40 964K 87.3K c4t500000E0119095A0d0 - - 14 40 934K 86.5K c4t500000E0119032A0d0 - - 18 40 1.11M 86.8K c4t500000E011903340d0 - - 13 32 821K 77.7K ------------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool used avail read write read write ------------------------- ----- ----- ----- ----- ----- ----- p1 749G 67.2G 49 44 897K 1.02M raidz 749G 67.2G 49 44 897K 1.02M c4t500000E011909320d0 - - 17 25 1.05M 96.4K c4t500000E011909300d0 - - 15 25 972K 96.2K c4t500000E011903030d0 - - 20 25 1.25M 96.3K c4t500000E011903300d0 - - 14 25 853K 91.2K c4t500000E0119091E0d0 - - 16 25 1017K 96.7K c4t500000E0119032D0d0 - - 15 25 955K 96.7K c4t500000E011903370d0 - - 19 25 1.21M 96.6K c4t500000E011903190d0 - - 13 25 843K 91.0K c4t500000E011903350d0 - - 16 25 1001K 96.5K c4t500000E0119095A0d0 - - 15 25 974K 96.3K c4t500000E0119032A0d0 - - 20 25 1.22M 96.5K c4t500000E011903340d0 - - 14 25 855K 90.7K ------------------------- ----- ----- ----- ----- ----- ----- ^C bash-3.00# This message posted from opensolaris.org
Another pool - different array, different host, different workload. And again - summay read throutput to all disks in a pool is 10x bigger than to a pool itself. Iny idea? bash-3.00# zpool iostat -v 1 capacity operations bandwidth pool used avail read write read write -------------------------------------- ----- ----- ----- ----- ----- ----- nfs-s5-1 4.32T 16.1T 304 127 11.9M 506K raidz 4.32T 16.1T 304 127 11.9M 506K c4t600C0FF00000000009258F2411CF3D01d0 - - 148 48 8.48M 67.8K c4t600C0FF00000000009258F6FA45D3801d0 - - 148 48 8.48M 67.9K c4t600C0FF00000000009258F1820617F01d0 - - 146 48 8.46M 67.9K c4t600C0FF00000000009258F24546FAC01d0 - - 146 48 8.45M 67.9K c4t600C0FF00000000009258F5949030301d0 - - 146 48 8.46M 67.9K c4t600C0FF00000000009258F24E8AADD01d0 - - 146 48 8.45M 67.9K c4t600C0FF00000000009258F5FD5023B01d0 - - 146 48 8.46M 67.9K c4t600C0FF00000000009258F17E7007801d0 - - 146 48 8.46M 67.9K c4t600C0FF00000000009258F598F6BE701d0 - - 146 48 8.46M 67.8K -------------------------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool used avail read write read write -------------------------------------- ----- ----- ----- ----- ----- ----- nfs-s5-1 4.32T 16.1T 508 72 34.5M 282K raidz 4.32T 16.1T 508 72 34.5M 282K c4t600C0FF00000000009258F2411CF3D01d0 - - 254 25 14.1M 37.6K c4t600C0FF00000000009258F6FA45D3801d0 - - 248 24 13.8M 38.1K c4t600C0FF00000000009258F1820617F01d0 - - 247 26 13.9M 37.6K c4t600C0FF00000000009258F24546FAC01d0 - - 240 26 13.8M 37.8K c4t600C0FF00000000009258F5949030301d0 - - 243 25 14.0M 37.3K c4t600C0FF00000000009258F24E8AADD01d0 - - 246 26 13.8M 38.3K c4t600C0FF00000000009258F5FD5023B01d0 - - 242 25 13.6M 38.6K c4t600C0FF00000000009258F17E7007801d0 - - 238 27 13.5M 39.4K c4t600C0FF00000000009258F598F6BE701d0 - - 258 27 14.6M 39.7K -------------------------------------- ----- ----- ----- ----- ----- ----- ^C bash-3.00# This message posted from opensolaris.org
There are a few related questions that I think you want answered. 1. How does RAID-Z effect performance? When using RAID-Z, each filesystem block is spread across (typically) all disks in the raid-z group. So to a first approximation, each raid-z group provides the iops of a single disk (but the bandwidth of N-1 disks). See Roch''s excellent article for a detailed explanation: http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to 2. Why does ''zpool iostat'' not report actual i/os? 3. Why are we doing so many read i/os? 4. Why are we reading so much data? ''zpool iostat'' reports the i/os that are seen at each level of the vdev tree, rather than the sum of the i/os that occur below that point in the vdev tree. This can provide some additional information when diagnosing performance problems. However, it is a bit counter-intuitive, so I always use iostat(1m). It may be clunky, but it does report on the actual i/os issued to the hardware. Also, I really like having the %busy reading, which ''zpool iostat'' does not provide. We are doing lots of read i/os because each block is spread out across all the disks in a raid-z group (as mentioned in Roch''s article). However, the "vdev cache" is causing us to issue many *fewer* i/os than would seem to be required, but reading much *more* data. For example, say we need to read a block of data. We''ll send the read down to the raid-z vdev. The raid-z vdev knows that it the data is spread out over its disks, so it (essentially) issues one read zio_t to each of the disk vdevs to retrieve the data. Now each of those disk vdevs will first look in its vdev cache. If it finds the data there, it returns it without ever actually issuing an i/o to the hardware. If it doesn''t find it there, it will issue a 64k i/o to the hardware, and put that 64k chunk into its vdev cache. Without the vdev cache, we would simply issue (Number of blocks to read) * (Number of disks in each raid-z vdev) read i/os to the hardware, and read the total number of bytes that you would expect, since each of those i/os would be for (approximately) 1/Ndisk bytes. However, with the vdev cache, we will issue fewer i/os, but read more data. 5. How can performance be improved? A. Use one big pool. Having 6 pools causes performance (and storage) to be stranded. When one filesystem is buiser than the others, it can only use the bandwidth and iops of its single raid-z vdev. If you had one big pool, that filesystem would be able to use all the disks in your system. B. Use smaller raid-z stripes. As Roch''s article explains, smaller raid-z stripes will provide more iops. We generally suggest 3 to 9 disks in each raid-z stripe. C. Use higher-performance disks. I''m not sure what the underlying storage you''re using is, but it''s pretty slow! As you can see from your per-disk iostat output, each device is only capable of 50-100 iops or 1-4MB/s, and takes on average over 100ms to service a request. If you are using some sort of hardware RAID enclosure, it may be working against you here. The perferred configuration would be to have each disk appear as a single device to the system. (This should be possible even with fancy RAID hardware.) So in conculsion, you can improve performance by creating one big pool with several raid-z stripes, each with 3 to 9 disks in it. These disks should be actual physical disks. Hope this helps, --matt ps. I''m drawing my conclusions based on the following data that you provided: On Wed, May 31, 2006 at 08:26:10AM -0700, Robert Milkowski wrote:> That''s interesting - ''zpool iostat'' shows quite small read volume to > any pool however if I run ''zpool iostat -v'' then I can see that while > read volume to a pool is small, read volume to each disk is actually > quite large so in summary I get over 10x read volume if I sum all > disks in a pool than on pool itself. These data are consistent with > iostat. So now even zpool claims that it actually issues 10x (and > more) read volume to all disks in a pool than to pool itself. > > Now - why???? It really hits performance here...> The problem is that I do not see any heavy traffic on network > interfaces nor using zpool iostat. However using just old iostat I can > see MUCH more traffic going to local disks. The difference is someting > like 10x - zpool iostat shows for example ~6MB/s of reads however > iostat shows ~50MB/s. The question is who''s lying?> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 57.6 2.0 3752.7 35.4 0.0 8.9 0.0 149.6 0 82 c4t...90E750d0 > 43.5 2.0 2846.9 35.4 0.0 5.3 0.0 116.4 0 66 c4t...90E7F0d0 > 56.6 2.0 3752.7 35.4 0.0 8.4 0.0 144.0 0 81 c4t...90E730d0 > 29.3 44.5 1558.9 212.8 0.0 5.0 0.0 67.1 0 96 c4t...92B540d0 > 9.1 65.7 582.3 313.9 0.0 11.2 0.0 149.4 0 100 c4t...8EDB20d0 > 0.0 78.9 0.0 75.3 0.0 35.0 0.0 443.8 0 100 c4t...9495A0d0> bash-3.00# fsstat zfs 1 > [...] > new name name attr attr lookup rddir read read write write > file remov chng get set ops ops ops bytes ops bytes > 10 12 8 919 7 102 0 32 975K 26 652K zfs > 6 21 10 1.22K 1 123 0 205 6.23M 4 33.5K zfs > 14 26 3 1.14K 9 127 0 46 1.33M 5 60.1K zfs > 13 11 8 1.02K 7 102 0 43 1.24M 22 514K zfs > 10 17 10 998 6 87 0 31 746K 85 2.45M zfs > 11 15 3 915 24 93 0 60 1.86M 6 54.3K zfs > 7 31 19 1.82K 5 167 0 23 636K 278 8.22M zfs > 14 22 13 1.44K 10 104 0 31 992K 257 7.84M zfs > 5 18 5 1.16K 4 80 0 26 764K 262 8.06M zfs > 1 19 6 572 2 75 0 19 579K 3 20.6K zfs> bash-3.00# zpool iostat -v p1 1 > capacity operations bandwidth > pool used avail read write read write > ------------------------- ----- ----- ----- ----- ----- ----- > p1 749G 67.2G 58 90 878K 903K > raidz 749G 67.2G 58 90 878K 903K > c4t500000E011909320d0 - - 15 40 959K 87.3K > c4t500000E011909300d0 - - 14 40 929K 86.5K > c4t500000E011903030d0 - - 18 40 1.11M 86.8K > c4t500000E011903300d0 - - 13 32 823K 77.7K > c4t500000E0119091E0d0 - - 15 40 961K 87.3K > c4t500000E0119032D0d0 - - 14 40 930K 86.5K > c4t500000E011903370d0 - - 18 40 1.11M 86.8K > c4t500000E011903190d0 - - 13 32 828K 77.8K > c4t500000E011903350d0 - - 15 40 964K 87.3K > c4t500000E0119095A0d0 - - 14 40 934K 86.5K > c4t500000E0119032A0d0 - - 18 40 1.11M 86.8K > c4t500000E011903340d0 - - 13 32 821K 77.7K > ------------------------- ----- ----- ----- ----- ----- -----
Hello Matthew, Wednesday, May 31, 2006, 8:09:08 PM, you wrote: MA> There are a few related questions that I think you want answered. MA> 1. How does RAID-Z effect performance? MA> When using RAID-Z, each filesystem block is spread across (typically) MA> all disks in the raid-z group. So to a first approximation, each raid-z MA> group provides the iops of a single disk (but the bandwidth of N-1 MA> disks). See Roch''s excellent article for a detailed explanation: MA> http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to MA> 2. Why does ''zpool iostat'' not report actual i/os? MA> 3. Why are we doing so many read i/os? MA> 4. Why are we reading so much data? MA> ''zpool iostat'' reports the i/os that are seen at each level of the vdev MA> tree, rather than the sum of the i/os that occur below that point in the MA> vdev tree. This can provide some additional information when diagnosing MA> performance problems. However, it is a bit counter-intuitive, so I MA> always use iostat(1m). It may be clunky, but it does report on the MA> actual i/os issued to the hardware. Also, I really like having the MA> %busy reading, which ''zpool iostat'' does not provide. MA> We are doing lots of read i/os because each block is spread out across MA> all the disks in a raid-z group (as mentioned in Roch''s article). MA> However, the "vdev cache" is causing us to issue many *fewer* i/os than MA> would seem to be required, but reading much *more* data. MA> For example, say we need to read a block of data. We''ll send the read MA> down to the raid-z vdev. The raid-z vdev knows that it the data is MA> spread out over its disks, so it (essentially) issues one read zio_t to MA> each of the disk vdevs to retrieve the data. Now each of those disk MA> vdevs will first look in its vdev cache. If it finds the data there, it MA> returns it without ever actually issuing an i/o to the hardware. If it MA> doesn''t find it there, it will issue a 64k i/o to the hardware, and put MA> that 64k chunk into its vdev cache. MA> Without the vdev cache, we would simply issue (Number of blocks to read) MA> * (Number of disks in each raid-z vdev) read i/os to the hardware, and MA> read the total number of bytes that you would expect, since each of MA> those i/os would be for (approximately) 1/Ndisk bytes. However, with MA> the vdev cache, we will issue fewer i/os, but read more data. MA> 5. How can performance be improved? MA> A. Use one big pool. MA> Having 6 pools causes performance (and storage) to be stranded. When MA> one filesystem is buiser than the others, it can only use the bandwidth MA> and iops of its single raid-z vdev. If you had one big pool, that MA> filesystem would be able to use all the disks in your system. MA> B. Use smaller raid-z stripes. MA> As Roch''s article explains, smaller raid-z stripes will provide more MA> iops. We generally suggest 3 to 9 disks in each raid-z stripe. MA> C. Use higher-performance disks. MA> I''m not sure what the underlying storage you''re using is, but it''s MA> pretty slow! As you can see from your per-disk iostat output, each MA> device is only capable of 50-100 iops or 1-4MB/s, and takes on average MA> over 100ms to service a request. If you are using some sort of hardware MA> RAID enclosure, it may be working against you here. The perferred MA> configuration would be to have each disk appear as a single device to MA> the system. (This should be possible even with fancy RAID hardware.) MA> So in conculsion, you can improve performance by creating one big pool MA> with several raid-z stripes, each with 3 to 9 disks in it. These disks MA> should be actual physical disks. MA> Hope this helps, That helps a lot - thank you. I wish I knew it before... Information Roch put on his blog should be explained both in MAN pages and ZFS Admin Guide - as this is something one would not expect. It actually means raid-z is useless in many enviroments compare to traditional raid-5. Now I use 3510 JBODs connected on two loops with MPxIO. Disks are 73GB 15K so they should be quite fast. Now I have to find out how to go away from raid-z... -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> That helps a lot - thank you. > I wish I knew it before... Information Roch put on his blog should be > explained both in MAN pages and ZFS Admin Guide - as this is something > one would not expect. > > It actually means raid-z is useless in many enviroments compare to > traditional raid-5.Well, it''s a trade-off. With RAID-5 you pay the RAID tax on writes; with RAID-Z you pay the tax on reads. There''s also another factor at play here, which is purely a matter of implementation that we need to fix. With a RAID-Z setup, all blocks are written in RAID-Z format -- even intent log blocks, which is really stupid. If you do a lot of synchronous writes, that really hurts your write bandwidth. But it''s unnecessary. Since we know that intent log blocks don''t live for more than a single transaction group (which is about five seconds), there''s no reason to allocate them space-efficiently. It would be far better, when allocating a B-byte intent log block in an N-disk RAID-Z group, to allocate B*N bytes but only write to one disk (or two if you want to be paranoid). This simple change should make synchronous I/O on N-way RAID-Z up to N times faster. Jeff
Jeff Bonwick wrote:> ... > >Since we know that intent log blocks don''t live for more than a >single transaction group (which is about five seconds), there''s >no reason to allocate them space-efficiently. It would be far >better, when allocating a B-byte intent log block in an N-disk >RAID-Z group, to allocate B*N bytes but only write to one disk >(or two if you want to be paranoid). This simple change should >make synchronous I/O on N-way RAID-Z up to N times faster. > >Would it make sense to keep the space-efficient allocation code around for times when disk space gets tight (as in less than 100 free blocks or similar) ? Darren
Hello Jeff, Thursday, June 1, 2006, 10:36:18 AM, you wrote:>> That helps a lot - thank you. >> I wish I knew it before... Information Roch put on his blog should be >> explained both in MAN pages and ZFS Admin Guide - as this is something >> one would not expect. >> >> It actually means raid-z is useless in many enviroments compare to >> traditional raid-5.JB> Well, it''s a trade-off. With RAID-5 you pay the RAID tax on writes; JB> with RAID-Z you pay the tax on reads. I know - I only wish I new better - raid-z should be explained better in a documentation. btw: what differences there''ll be between raidz1 and raidz2? I guess two checksums will be stored so one loose approximately space of two disks in a one raidz2 group. Any other things? JB> There''s also another factor at play here, which is purely a matter JB> of implementation that we need to fix. With a RAID-Z setup, all JB> blocks are written in RAID-Z format -- even intent log blocks, JB> which is really stupid. If you do a lot of synchronous writes, JB> that really hurts your write bandwidth. But it''s unnecessary. JB> Since we know that intent log blocks don''t live for more than a JB> single transaction group (which is about five seconds), there''s JB> no reason to allocate them space-efficiently. It would be far JB> better, when allocating a B-byte intent log block in an N-disk JB> RAID-Z group, to allocate B*N bytes but only write to one disk JB> (or two if you want to be paranoid). This simple change should JB> make synchronous I/O on N-way RAID-Z up to N times faster. Would be probably very useful on nfs servers. btw: just a quick thought - why not to write one block only on 2 disks (+checksum on a one disk) instead of spreading one fs block to N-1 disks? That way zfs could read many fs block at the same time in case of larger raid-z pools. ? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Roch Bourbonnais - Performance Engineering
2006-Jun-01 13:00 UTC
[zfs-discuss] Re: Big IOs overhead due to ZFS?
Robert Milkowski writes: > > > > btw: just a quick thought - why not to write one block only on 2 disks > (+checksum on a one disk) instead of spreading one fs block to N-1 > disks? That way zfs could read many fs block at the same time in case > of larger raid-z pools. ? That''s what you have today with a dynamic stripe of (2+1) raid-z vdevs. If the user requests more devices in a group, it''s because he wants to have those disk blocks for the storage. -r > > -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Thu, 2006-06-01 at 04:36, Jeff Bonwick wrote:> It would be far > better, when allocating a B-byte intent log block in an N-disk > RAID-Z group, to allocate B*N bytes but only write to one disk > (or two if you want to be paranoid). This simple change should > make synchronous I/O on N-way RAID-Z up to N times faster.I dunno about *wanting* to be paranoid... I''d think that, to provide the same level of survivability in the face of disk failure as regular data, you''d need to write to a minimum of 2 disks in a regular RAID-Z and a minimum of three in a dual-parity raid-Z ... - Bill
Hello Roch, Thursday, June 1, 2006, 3:00:46 PM, you wrote: RBPE> Robert Milkowski writes: >> >> >> >> btw: just a quick thought - why not to write one block only on 2 disks >> (+checksum on a one disk) instead of spreading one fs block to N-1 >> disks? That way zfs could read many fs block at the same time in case >> of larger raid-z pools. ? RBPE> That''s what you have today with a dynamic stripe of (2+1) RBPE> raid-z vdevs. If the user requests more devices in a group, RBPE> it''s because he wants to have those disk blocks for the RBPE> storage. Yeah, that''s right - silly me :) -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On Thu, Jun 01, 2006 at 02:46:32PM +0200, Robert Milkowski wrote:> btw: what differences there''ll be between raidz1 and raidz2? I guess > two checksums will be stored so one loose approximately space of two > disks in a one raidz2 group. Any other things?The difference between raidz1 and raidz2 is just that the latter is resilient against losing 2 disks rather than just 1. If you have a total of 5 disks in a raidz1 stripe your optimal capacity will be 4/5ths of the raw capacity of the disks whereas it would be 3/5ths with raidz2. Consider however that you''ll typically use larger stripes with raidz2 so you aren''t necessarily going to "lose" any capacity depending on how you configure your pool. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Hello Adam, Friday, June 2, 2006, 12:10:47 AM, you wrote: AL> On Thu, Jun 01, 2006 at 02:46:32PM +0200, Robert Milkowski wrote:>> btw: what differences there''ll be between raidz1 and raidz2? I guess >> two checksums will be stored so one loose approximately space of two >> disks in a one raidz2 group. Any other things?AL> The difference between raidz1 and raidz2 is just that the latter is AL> resilient against losing 2 disks rather than just 1. If you have a total AL> of 5 disks in a raidz1 stripe your optimal capacity will be 4/5ths of the AL> raw capacity of the disks whereas it would be 3/5ths with raidz2. Consider AL> however that you''ll typically use larger stripes with raidz2 so you aren''t AL> necessarily going to "lose" any capacity depending on how you configure your AL> pool. If I have 6 disks - wouldn''t a pool with 2x raidz1 (3 disks) be actually faster than a pool with raidz2 (6disks)? (many small random reads)? I know that redundancy with raidz2 would be better as any 3 disks can fail while with 2xraidz only one disk from each raidz1 group can fail. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com