Hi list, My zfs write performance is poor and need your help. I create zpool with 2 raidz1. When the space is to be used up, I add 2 another raidz1 to extend the zpool. After some days, the zpool is almost full, I remove some old data. But now, as show below, the first 2 raidz1 vdev usage is about 78% and the last 2 raidz1 vdev usage is about 93%. I have line in /etc/system set zfs:metaslab_df_free_pct=4 So the performance degrade will happen when the vdev usage is above 90%. All my file is small files which size is about 150KB. Now the questions is: 1. Should I balance the data between the vdevs by copy the data and remove the data which locate in last 2 vdevs? 2. Is there any method to automatically re-balance the data? or Any better solution to resolve this problem? root at nas-01:~# zpool iostat -v capacity operations bandwidth pool used avail read write read write -------------------------------------- ----- ----- ----- ----- ----- ----- datapool 21.3T 3.93T 26 96 81.4K 2.81M raidz1 4.93T 1.39T 8 28 25.7K 708K c3t6002219000854867000003B2490FB009d0 - - 3 10 216K 119K c3t6002219000854867000003B4490FB063d0 - - 3 10 214K 119K c3t60022190008528890000055F4CB79C10d0 - - 3 10 214K 119K c3t6002219000854867000003B8490FB0FFd0 - - 3 10 215K 119K c3t6002219000854867000003BA490FB14Fd0 - - 3 10 215K 119K c3t60022190008528890000041C490FAFA0d0 - - 3 10 215K 119K c3t6002219000854867000003C0490FB27Dd0 - - 3 10 214K 119K raidz1 4.64T 1.67T 8 32 24.6K 581K c3t6002219000854867000003C2490FB2BFd0 - - 3 10 224K 98.2K c3t60022190008528890000041F490FAFD0d0 - - 3 10 222K 98.2K c3t600221900085288900000428490FB0D8d0 - - 3 10 222K 98.2K c3t600221900085288900000422490FB02Cd0 - - 3 10 223K 98.3K c3t600221900085288900000425490FB07Cd0 - - 3 10 223K 98.3K c3t600221900085288900000434490FB24Ed0 - - 3 10 223K 98.3K c3t60022190008528890000043949100968d0 - - 3 10 224K 98.2K raidz1 5.88T 447G 5 17 16.0K 67.7K c3t60022190008528890000056B4CB79D66d0 - - 3 12 215K 12.2K c3t6002219000854867000004B94CB79F91d0 - - 3 12 216K 12.2K c3t6002219000854867000004BB4CB79FE1d0 - - 3 12 214K 12.2K c3t6002219000854867000004BD4CB7A035d0 - - 3 12 215K 12.2K c3t6002219000854867000004BF4CB7A0ABd0 - - 3 12 216K 12.2K c3t60022190008528890000055C4CB79BB8d0 - - 3 12 214K 12.2K c3t6002219000854867000004C14CB7A0FDd0 - - 3 12 215K 12.2K raidz1 5.88T 441G 4 1 14.9K 12.4K c3t60022190008528890000042B490FB124d0 - - 1 1 131K 2.33K c3t6002219000854867000004C54CB7A199d0 - - 1 1 132K 2.33K c3t6002219000854867000004C74CB7A1D5d0 - - 1 1 130K 2.33K c3t6002219000852889000005594CB79B64d0 - - 1 1 133K 2.33K c3t6002219000852889000005624CB79C86d0 - - 1 1 132K 2.34K c3t6002219000852889000005654CB79CCCd0 - - 1 1 131K 2.34K c3t6002219000852889000005684CB79D1Ed0 - - 1 1 132K 2.33K c3t6B8AC6F0000F8376000005864DC9E9F1d0 0 928G 0 16 289 1.47M -------------------------------------- ----- ----- ----- ----- ----- ----- root at nas-01:~# -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111109/ecf69f3c/attachment-0001.html>
Edward Ned Harvey
2011-Nov-09 14:05 UTC
[zfs-discuss] Data distribution not even between vdevs
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ding Honghui > > But now, as show below, the first 2 raidz1 vdev usage is about 78% and the > last 2 raidz1 vdev usage is about 93%.In this case, when you write, it should be writing to the first two vdevs, not the last two. So the fact that the last two are over 93% full should be irrelevant in terms of write performance.> All my file is small files which size is about 150KB.That''s too bad. Raidz performs well with large sequential data, and performs poor with small random files.> Now the questions is: > 1. Should I balance the data between the vdevs by copy the data and remove > the data which locate in last 2 vdevs?If you want to. But most people wouldn''t bother. Especially since you''re talking about 75% versus 90%. It''s difficult to balance it so *precisely* as to get them both around 85%> 2. Is there any method to automatically re-balance the data? > orThere is no automatic way to do it.> Any better solution to resolve this problem?I would recommend, if possible, re-creating your pool as a bunch of mirrors instead of raidz. It will perform better, but it will cost hardware. Also, if you have compressible data then enabling compression gains both performance and available disk space.
Gregg Wonderly
2011-Nov-09 15:09 UTC
[zfs-discuss] Data distribution not even between vdevs
On 11/9/2011 8:05 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Ding Honghui >> >> But now, as show below, the first 2 raidz1 vdev usage is about 78% and the >> last 2 raidz1 vdev usage is about 93%. > In this case, when you write, it should be writing to the first two vdevs, > not the last two. So the fact that the last two are over 93% full should be > irrelevant in terms of write performance. > > >> All my file is small files which size is about 150KB. > That''s too bad. Raidz performs well with large sequential data, and > performs poor with small random files. > > >> Now the questions is: >> 1. Should I balance the data between the vdevs by copy the data and remove >> the data which locate in last 2 vdevs? > If you want to. But most people wouldn''t bother. Especially since you''re > talking about 75% versus 90%. It''s difficult to balance it so *precisely* > as to get them both around 85% > > >> 2. Is there any method to automatically re-balance the data? >> or > There is no automatic way to do it.For me, this is a key issue. If there was an automatic rebalancing mechanism, that same mechanism would work perfectly to allow pools to have disk sets removed. It would provide the needed basic mechanism of just moving stuff around to eliminate the use of a particular part of the pool that you wanted to remove. Gregg Wonderly
Edward Ned Harvey
2011-Nov-10 13:31 UTC
[zfs-discuss] Data distribution not even between vdevs
> From: Gregg Wonderly [mailto:greggwon at gmail.com] > > > There is no automatic way to do it. > For me, this is a key issue. If there was an automatic rebalancingmechanism,> that same mechanism would work perfectly to allow pools to have disk sets > removed. It would provide the needed basic mechanism of just moving stuff > around to eliminate the use of a particular part of the pool that youwanted> to > remove.Search this list for bp_rewrite. There are many features that are dependent on this feature - rebalance, defrag, vdev removal, toggle compression or dedup for existing data, etc. It''s long since requested by many people, but apparently fundamentally difficult to do, or something.