Hi, This may well have been covered before but I''ve not been able to find an answer to this particular question. I''ve setup a raidz2 test env using files like this: # mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2 # zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5 # zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10 # zpool add dataPool spare /xvm/s1 /xvm/s2 # zpool status dataPool pool: dataPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM dataPool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 /xvm/t1 ONLINE 0 0 0 /xvm/t2 ONLINE 0 0 0 /xvm/t3 ONLINE 0 0 0 /xvm/t4 ONLINE 0 0 0 /xvm/t5 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 /xvm/t6 ONLINE 0 0 0 /xvm/t7 ONLINE 0 0 0 /xvm/t8 ONLINE 0 0 0 /xvm/t9 ONLINE 0 0 0 /xvm/t10 ONLINE 0 0 0 spares /xvm/s1 AVAIL /xvm/s2 AVAIL All is good and it works, I then copied a few gigs of data onto the pool and checked with zpool list root at vmstor01:/# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT dataPool 9.94G 4.89G 5.04G 49% 1.00x ONLINE - Now here''s what I don''t get, why does it say the poo sizel is 9.94G when it''s made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df -h also reports correctly. For a RAIDZ2 pool I find the information, the fact that it''s 9.94G and not 5.9G, completely useless and misleading, why is parity part of the calculation? Also ALLOC seems wrong, there''s nothing in the pool except a full copy of /usr (just to fill up with test data), it does however correctly display that I''ve used about 50% of the pool. This is a build 131 machine btw. root at vmstor01:/# df -h /dataPool Filesystem Size Used Avail Use% Mounted on dataPool 5.9G 3.0G 3.0G 51% /dataPool Cheers, - Lasse
i don''t see it as completely useless. Different reports for different things. zpool status is the status of the POOL. It makes PERFECT sense that it would show the raw data stats. zfs list and df show different things because they are different tools for different jobs. It can be confusing at first but you will find that it IS useful depending on what you are trying to do. ZFS often forces you to look at things differently. Just wait till you have a bunch of compressed and deduped data and snapshots. Then you''ll see how useful this is. On Mon, Feb 8, 2010 at 4:35 PM, Lasse Osterild <lasseoe at unixzone.dk> wrote:> Hi, > > This may well have been covered before but I''ve not been able to find an > answer to this particular question. > > I''ve setup a raidz2 test env using files like this: > > # mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2 > # zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5 > # zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10 > # zpool add dataPool spare /xvm/s1 /xvm/s2 > > # zpool status dataPool > pool: dataPool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > dataPool ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > /xvm/t1 ONLINE 0 0 0 > /xvm/t2 ONLINE 0 0 0 > /xvm/t3 ONLINE 0 0 0 > /xvm/t4 ONLINE 0 0 0 > /xvm/t5 ONLINE 0 0 0 > raidz2-1 ONLINE 0 0 0 > /xvm/t6 ONLINE 0 0 0 > /xvm/t7 ONLINE 0 0 0 > /xvm/t8 ONLINE 0 0 0 > /xvm/t9 ONLINE 0 0 0 > /xvm/t10 ONLINE 0 0 0 > spares > /xvm/s1 AVAIL > /xvm/s2 AVAIL > > All is good and it works, I then copied a few gigs of data onto the pool > and checked with zpool list > root at vmstor01:/# zpool list > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > dataPool 9.94G 4.89G 5.04G 49% 1.00x ONLINE - > > Now here''s what I don''t get, why does it say the poo sizel is 9.94G when > it''s made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G > which df -h also reports correctly. For a RAIDZ2 pool I find the > information, the fact that it''s 9.94G and not 5.9G, completely useless and > misleading, why is parity part of the calculation? Also ALLOC seems wrong, > there''s nothing in the pool except a full copy of /usr (just to fill up with > test data), it does however correctly display that I''ve used about 50% of > the pool. This is a build 131 machine btw. > > root at vmstor01:/# df -h /dataPool > Filesystem Size Used Avail Use% Mounted on > dataPool 5.9G 3.0G 3.0G 51% /dataPool > > Cheers, > > - Lasse > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100208/d4601889/attachment.html>
This is a FAQ, but the FAQ is not well maintained :-( http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq On Feb 8, 2010, at 1:35 PM, Lasse Osterild wrote:> Hi, > > This may well have been covered before but I''ve not been able to find an answer to this particular question. > > I''ve setup a raidz2 test env using files like this: > > # mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2 > # zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5 > # zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10 > # zpool add dataPool spare /xvm/s1 /xvm/s2 > > # zpool status dataPool > pool: dataPool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > dataPool ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > /xvm/t1 ONLINE 0 0 0 > /xvm/t2 ONLINE 0 0 0 > /xvm/t3 ONLINE 0 0 0 > /xvm/t4 ONLINE 0 0 0 > /xvm/t5 ONLINE 0 0 0 > raidz2-1 ONLINE 0 0 0 > /xvm/t6 ONLINE 0 0 0 > /xvm/t7 ONLINE 0 0 0 > /xvm/t8 ONLINE 0 0 0 > /xvm/t9 ONLINE 0 0 0 > /xvm/t10 ONLINE 0 0 0 > spares > /xvm/s1 AVAIL > /xvm/s2 AVAIL > > All is good and it works, I then copied a few gigs of data onto the pool and checked with zpool list > root at vmstor01:/# zpool list > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > dataPool 9.94G 4.89G 5.04G 49% 1.00x ONLINE - > > Now here''s what I don''t get, why does it say the poo sizel is 9.94G when it''s made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df -h also reports correctly.No, zpool displays the available pool space. df -h displays something else entirely. If you have 10 1GB vdevs, then the total available pool space is 10GB. From the zpool(1m) man page: ... size Total size of the storage pool. These space usage properties report actual physical space available to the storage pool. The physical space can be different from the total amount of space that any contained datasets can actually use. The amount of space used in a raidz configuration depends on the characteristics of the data being written. In addition, ZFS reserves some space for internal accounting that the zfs(1M) command takes into account, but the zpool command does not. For non-full pools of a reasonable size, these effects should be invisible. For small pools, or pools that are close to being completely full, these discrepancies may become more noticeable. ... -- richard> For a RAIDZ2 pool I find the information, the fact that it''s 9.94G and not 5.9G, completely useless and misleading, why is parity part of the calculation? Also ALLOC seems wrong, there''s nothing in the pool except a full copy of /usr (just to fill up with test data), it does however correctly display that I''ve used about 50% of the pool. This is a build 131 machine btw. > > root at vmstor01:/# df -h /dataPool > Filesystem Size Used Avail Use% Mounted on > dataPool 5.9G 3.0G 3.0G 51% /dataPool > > Cheers, > > - Lasse > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 08/02/2010, at 22.50, Richard Elling wrote:>> >> root at vmstor01:/# zpool list >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> dataPool 9.94G 4.89G 5.04G 49% 1.00x ONLINE - >> >> Now here''s what I don''t get, why does it say the poo sizel is 9.94G when it''s made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df -h also reports correctly. > > No, zpool displays the available pool space. df -h displays something else entirely. > If you have 10 1GB vdevs, then the total available pool space is 10GB. From the > zpool(1m) man page: > ... > size > Total size of the storage pool. > > These space usage properties report actual physical space > available to the storage pool. The physical space can be > different from the total amount of space that any contained > datasets can actually use. The amount of space used in a > raidz configuration depends on the characteristics of the > data being written. In addition, ZFS reserves some space for > internal accounting that the zfs(1M) command takes into > account, but the zpool command does not. For non-full pools > of a reasonable size, these effects should be invisible. For > small pools, or pools that are close to being completely > full, these discrepancies may become more noticeable. > ... > > -- richardOk thanks I know that the amount of used space will vary, but what''s the usefulness of the total size when ie in my pool above 4 x 1G (roughly, depending on recordsize) are reserved for parity, it''s not like it''s useable for anything else :) I just don''t see the point when it''s a raidz or raidz2 pool, but I guess I am missing something here. Cheers, - Lasse
On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote:> Ok thanks I know that the amount of used space will vary, but what''s > the usefulness of the total size when ie in my pool above 4 x 1G > (roughly, depending on recordsize) are reserved for parity, it''s not > like it''s useable for anything else :) I just don''t see the point > when it''s a raidz or raidz2 pool, but I guess I am missing something > here.The basis of raidz is that each block is its own raid stripe, with its own layout. At present, this only matters for the size of the stripe. For example, if I write a single 512-byte block, to a dual-parity raidz2, I will write three blocks, to three disks. With a larger block, I will have more data over more disks, until the block is big enough to stripe evenly over all of them. As the block gets bigger yet, more is written to each disk as part of the stripe, and the parity units get bigger to match the size of the largest data unit. This "rounding" can very often mean that different disks have different amounts of data for each stripe. Crucially, it also means the ratio of parity-to-data is not fixed. This tends to average out on a pool with lots of data and mixed block sizes, but not always; consider an extreme case of a pool containing only datasets with blocksize=512. That''s what the comments in the documentation are referring to, and the major reason for the zpool output you see. In future, it may go further and be more important. Just as the data count per stripe can vary, there''s nothing fundamental in the raidz layout that says that the same parity count and method has to be used for the entire pool, either. Raidz already degrades to simple mirroring in some of the same small-stripe cases discussed above. There''s no particular reason, in theory, why they could not also have different amounts of parity on a per-block basis. I imagine that when bp-rewrite and the ability to reshape pools comes along, this will indeed be the case, at least during transition. As a simple example, when reshaping a raidz1 to a raidz2 by adding a disk, there will be blocks with single parity and other blocks with dual for a time until the operation is finished. Maybe one day in the future, there will just be a basic "raidz" vdev type, and we can set dataset properties for the number of additional parity blocks each should get. This might be a little like we can currently set "copies", including that it would only affect new writes and lead to very mixed redundancy states. Noone has actually said this is a real goal, and the reasons it''s not presently allowed include administrative and operational simplicity as well as implementation and testing constraints, but I think it would be handy and cool. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100209/f20f7a27/attachment.bin>
Hi Richard, I last updated this FAQ on 1/19. Which part is not well-maintained? :-) Cindy On 02/08/10 14:50, Richard Elling wrote:> This is a FAQ, but the FAQ is not well maintained :-( > http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq > > On Feb 8, 2010, at 1:35 PM, Lasse Osterild wrote: >> Hi, >> >> This may well have been covered before but I''ve not been able to find an answer to this particular question. >> >> I''ve setup a raidz2 test env using files like this: >> >> # mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2 >> # zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5 >> # zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10 >> # zpool add dataPool spare /xvm/s1 /xvm/s2 >> >> # zpool status dataPool >> pool: dataPool >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> dataPool ONLINE 0 0 0 >> raidz2-0 ONLINE 0 0 0 >> /xvm/t1 ONLINE 0 0 0 >> /xvm/t2 ONLINE 0 0 0 >> /xvm/t3 ONLINE 0 0 0 >> /xvm/t4 ONLINE 0 0 0 >> /xvm/t5 ONLINE 0 0 0 >> raidz2-1 ONLINE 0 0 0 >> /xvm/t6 ONLINE 0 0 0 >> /xvm/t7 ONLINE 0 0 0 >> /xvm/t8 ONLINE 0 0 0 >> /xvm/t9 ONLINE 0 0 0 >> /xvm/t10 ONLINE 0 0 0 >> spares >> /xvm/s1 AVAIL >> /xvm/s2 AVAIL >> >> All is good and it works, I then copied a few gigs of data onto the pool and checked with zpool list >> root at vmstor01:/# zpool list >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> dataPool 9.94G 4.89G 5.04G 49% 1.00x ONLINE - >> >> Now here''s what I don''t get, why does it say the poo sizel is 9.94G when it''s made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df -h also reports correctly. > > No, zpool displays the available pool space. df -h displays something else entirely. > If you have 10 1GB vdevs, then the total available pool space is 10GB. From the > zpool(1m) man page: > ... > size > Total size of the storage pool. > > These space usage properties report actual physical space > available to the storage pool. The physical space can be > different from the total amount of space that any contained > datasets can actually use. The amount of space used in a > raidz configuration depends on the characteristics of the > data being written. In addition, ZFS reserves some space for > internal accounting that the zfs(1M) command takes into > account, but the zpool command does not. For non-full pools > of a reasonable size, these effects should be invisible. For > small pools, or pools that are close to being completely > full, these discrepancies may become more noticeable. > ... > > -- richard > >> For a RAIDZ2 pool I find the information, the fact that it''s 9.94G and not 5.9G, completely useless and misleading, why is parity part of the calculation? Also ALLOC seems wrong, there''s nothing in the pool except a full copy of /usr (just to fill up with test data), it does however correctly display that I''ve used about 50% of the pool. This is a build 131 machine btw. >> >> root at vmstor01:/# df -h /dataPool >> Filesystem Size Used Avail Use% Mounted on >> dataPool 5.9G 3.0G 3.0G 51% /dataPool >> >> Cheers, >> >> - Lasse >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 09/02/2010, at 00.23, Daniel Carosone wrote:> On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote: >> Ok thanks I know that the amount of used space will vary, but what''s >> the usefulness of the total size when ie in my pool above 4 x 1G >> (roughly, depending on recordsize) are reserved for parity, it''s not >> like it''s useable for anything else :) I just don''t see the point >> when it''s a raidz or raidz2 pool, but I guess I am missing something >> here. > > The basis of raidz is that each block is its own raid stripe, with its > own layout. At present, this only matters for the size of the stripe. > For example, if I write a single 512-byte block, to a dual-parity > raidz2, I will write three blocks, to three disks. With a larger > block, I will have more data over more disks, until the block is big > enough to stripe evenly over all of them. As the block gets bigger > yet, more is written to each disk as part of the stripe, and the > parity units get bigger to match the size of the largest data unit. > This "rounding" can very often mean that different disks have > different amounts of data for each stripe. > > Crucially, it also means the ratio of parity-to-data is not fixed. > This tends to average out on a pool with lots of data and mixed > block sizes, but not always; consider an extreme case of a pool > containing only datasets with blocksize=512. That''s what the comments > in the documentation are referring to, and the major reason for the > zpool output you see. > > In future, it may go further and be more important. > > Just as the data count per stripe can vary, there''s nothing > fundamental in the raidz layout that says that the same parity count > and method has to be used for the entire pool, either. Raidz already > degrades to simple mirroring in some of the same small-stripe cases > discussed above. > > There''s no particular reason, in theory, why they could not also have > different amounts of parity on a per-block basis. I imagine that when > bp-rewrite and the ability to reshape pools comes along, this will > indeed be the case, at least during transition. As a simple example, > when reshaping a raidz1 to a raidz2 by adding a disk, there will be > blocks with single parity and other blocks with dual for a time until > the operation is finished. > > Maybe one day in the future, there will just be a basic "raidz" vdev > type, and we can set dataset properties for the number of additional > parity blocks each should get. This might be a little like we can > currently set "copies", including that it would only affect new writes > and lead to very mixed redundancy states. > > Noone has actually said this is a real goal, and the reasons it''s not > presently allowed include administrative and operational simplicity as > well as implementation and testing constraints, but I think it would > be handy and cool. > > -- > Dan.Thanks Dan! :) That explanation made perfect sense and I appreciate you taking the time to write this, perhaps parts of it could go into the FAQ ? I realise that it''s sort of in there already but it doesn''t explain it very well. Cheers, - Lasse
Hi Lasse, I expanded this entry to include more details of the zpool list and zfs list reporting. See if the new explanation provides enough details. Thanks, Cindy On 02/08/10 16:51, Lasse Osterild wrote:> On 09/02/2010, at 00.23, Daniel Carosone wrote: > >> On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote: >>> Ok thanks I know that the amount of used space will vary, but what''s >>> the usefulness of the total size when ie in my pool above 4 x 1G >>> (roughly, depending on recordsize) are reserved for parity, it''s not >>> like it''s useable for anything else :) I just don''t see the point >>> when it''s a raidz or raidz2 pool, but I guess I am missing something >>> here. >> The basis of raidz is that each block is its own raid stripe, with its >> own layout. At present, this only matters for the size of the stripe. >> For example, if I write a single 512-byte block, to a dual-parity >> raidz2, I will write three blocks, to three disks. With a larger >> block, I will have more data over more disks, until the block is big >> enough to stripe evenly over all of them. As the block gets bigger >> yet, more is written to each disk as part of the stripe, and the >> parity units get bigger to match the size of the largest data unit. >> This "rounding" can very often mean that different disks have >> different amounts of data for each stripe. >> >> Crucially, it also means the ratio of parity-to-data is not fixed. >> This tends to average out on a pool with lots of data and mixed >> block sizes, but not always; consider an extreme case of a pool >> containing only datasets with blocksize=512. That''s what the comments >> in the documentation are referring to, and the major reason for the >> zpool output you see. >> >> In future, it may go further and be more important. >> >> Just as the data count per stripe can vary, there''s nothing >> fundamental in the raidz layout that says that the same parity count >> and method has to be used for the entire pool, either. Raidz already >> degrades to simple mirroring in some of the same small-stripe cases >> discussed above. >> >> There''s no particular reason, in theory, why they could not also have >> different amounts of parity on a per-block basis. I imagine that when >> bp-rewrite and the ability to reshape pools comes along, this will >> indeed be the case, at least during transition. As a simple example, >> when reshaping a raidz1 to a raidz2 by adding a disk, there will be >> blocks with single parity and other blocks with dual for a time until >> the operation is finished. >> >> Maybe one day in the future, there will just be a basic "raidz" vdev >> type, and we can set dataset properties for the number of additional >> parity blocks each should get. This might be a little like we can >> currently set "copies", including that it would only affect new writes >> and lead to very mixed redundancy states. >> >> Noone has actually said this is a real goal, and the reasons it''s not >> presently allowed include administrative and operational simplicity as >> well as implementation and testing constraints, but I think it would >> be handy and cool. >> >> -- >> Dan. > > Thanks Dan! :) > > That explanation made perfect sense and I appreciate you taking the time to write this, perhaps parts of it could go into the FAQ ? I realise that it''s sort of in there already but it doesn''t explain it very well. > > Cheers, > > - Lasse > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, Feb 08, 2010 at 05:23:29PM -0700, Cindy Swearingen wrote:> Hi Lasse, > > I expanded this entry to include more details of the zpool list and > zfs list reporting. > > See if the new explanation provides enough details.Cindy, feel free to crib from or refer to my text in whatever way might help. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100209/28eb2a48/attachment.bin>