Hello. I have a question about how ZFS works with "Dinamic Stripe". Well, start with the next situation: - 4 Disk of 100MB in stripe format under ZFS. - We use the stripe in a 75%, so we have free 100MB. (easy) Well, we add a new disk of 100MB in the pool. So we have 200MB free but only 100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 1 disk. The questions are: - Have ZFS any kind of reorganization of the data in the stripe that change this situation and become in 200MB free with the speed of 5 disks? - If the answer is yes, how is it does? in the background? Thanks for your time (and sorry for my english). JLBG -- This message posted from opensolaris.org
Jose Luis Barqu?n Guerola wrote:> Hello. > I have a question about how ZFS works with "Dinamic Stripe". > > Well, start with the next situation: > - 4 Disk of 100MB in stripe format under ZFS. > - We use the stripe in a 75%, so we have free 100MB. (easy) > > Well, we add a new disk of 100MB in the pool. So we have 200MB free but only 100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 1 disk. > > The questions are: > - Have ZFS any kind of reorganization of the data in the stripe that change this situation and become in 200MB free with the speed of 5 disks? > - If the answer is yes, how is it does? in the background? > > Thanks for your time (and sorry for my english). > > JLBG >When you add more vdevs to the zpool, NEW data is written to the new stripe width. That is, when data was written to the original pool, it was written across 4 drives. It now will be written across 5 drives. Existing data WILL NOT be changed. So, for a zpool 75% full, you will NOT get to immediately use the first 75% of the new vdevs added. Thus, in your case, you started with a 400MB zpool (with 300MB of data). You added another 100MB vdev, resulting in a 500MB zpool. 300MB is written across 4 drives, and will have the appropriate speed. 75% of the new vdev isn''t immediately usable (as it corresponds to the 75% in-use on the other 4 vdevs), so you effectively only have added 25MB of immediately usable space. Thus, you have: 300MB across 4 vdevs 125MB across 5 vdevs 75MB "wasted" space on 1 vdev To correct this - that is, to recover the 75MB of "wasted" space and to move the 300MB from spanning 4 vdevs to spanning 5 vdevs - you need to re-write the entire existing data space. Right now, there is no background or other automatic method to do this. ''cp -rp'' or ''rsync'' is a good idea. We really should have something like ''zpool scrub'' do this automatically. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
Erik Trimble wrote:> Jose Luis Barqu?n Guerola wrote: >> Hello. >> I have a question about how ZFS works with "Dinamic Stripe". >> >> Well, start with the next situation: >> - 4 Disk of 100MB in stripe format under ZFS. >> - We use the stripe in a 75%, so we have free 100MB. (easy) >> >> Well, we add a new disk of 100MB in the pool. So we have 200MB free >> but only 100MB will have the speed of 4 disk and, the rest 100MB will >> have the speed of 1 disk. >> >> The questions are: >> - Have ZFS any kind of reorganization of the data in the stripe >> that change this situation and become in 200MB free with the speed of >> 5 disks? >> - If the answer is yes, how is it does? in the background?Yes, new writes are biased towards the more-empty vdev.>> >> Thanks for your time (and sorry for my english). >> >> JLBG >> > > When you add more vdevs to the zpool, NEW data is written to the new > stripe width. That is, when data was written to the original pool, > it was written across 4 drives. It now will be written across 5 > drives. Existing data WILL NOT be changed. > > So, for a zpool 75% full, you will NOT get to immediately use the > first 75% of the new vdevs added. > > Thus, in your case, you started with a 400MB zpool (with 300MB of > data). You added another 100MB vdev, resulting in a 500MB zpool. > 300MB is written across 4 drives, and will have the appropriate > speed. 75% of the new vdev isn''t immediately usable (as it > corresponds to the 75% in-use on the other 4 vdevs), so you > effectively only have added 25MB of immediately usable space. Thus, > you have: > > 300MB across 4 vdevs > 125MB across 5 vdevs > 75MB "wasted" space on 1 vdev > > To correct this - that is, to recover the 75MB of "wasted" space and > to move the 300MB from spanning 4 vdevs to spanning 5 vdevs - you > need to re-write the entire existing data space. Right now, there is > no background or other automatic method to do this. ''cp -rp'' or > ''rsync'' is a good idea. > We really should have something like ''zpool scrub'' do this automatically. >No. Dynamic striping is not RAID-0, which is what you are describing. In a dynamic stripe, the data written is not divided up amongst the current devices in the stripe. Rather, data is chunked and written to the vdevs. When about 500 kBytes has been written to a vdev, the next chunk is written to another vdev. The choice of which vdev to go to next is based, in part, on the amount of free space available on the vdev. So you get your cake (stochastic spreading of data across vdevs) and you get to eat it (use all available space), too. -- richard
Thank you "Relling" and "et151817" for your answers. So just to end the post: Relling supouse the next situation: One zpool in "Dinamic Stripe" with two disk, one of 100MB and the second with 200MB if the spread is "stochastic spreading of data across vdevs" you will have the double of possibilities of save one chunk in the second disk than in the first, right? Thanks for your time (and sorry for my english). JLBG -- This message posted from opensolaris.org
Jose Luis Barqu?n Guerola wrote:> Thank you "Relling" and "et151817" for your answers. > > So just to end the post: > > Relling supouse the next situation: > One zpool in "Dinamic Stripe" with two disk, one of 100MB and the second with 200MB > > if the spread is "stochastic spreading of data across vdevs" you will have the double of possibilities of save one chunk in the second disk than in the first, right? >The simple answer is yes. The more complex answer is that copies will try to be spread across different vdevs. Metadata, by default, uses copies=2, so you could expect the metadata to be more evenly spread across the disks. -- richard
> try to be spread across different vdevs.% zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- z 686G 434G 40 5 2.46M 271K c1t0d0s7 250G 194G 14 1 877K 94.2K c1t1d0s7 244G 200G 15 2 948K 96.5K c0d0 193G 39.1G 10 1 689K 80.2K note that c0d0 is basically full, but still serving 10 of every 15 reads, and 82% of the writes. Rob