I''m in the process of replacing drive in a pool, and the resilver times seem to have increased with each device. The way that I''m doing this is by pulling a drive, physically replacing it, then doing ''cfgadm -c configure ____ ; zpool replace tank ____''. I don''t have any hot-swap bays available, so I''m physically replacing the device before doing a ''zpool replace''. I''m replacing Western Digital WD10EADS 1TB drives with Hitachi 5K3000 3TB drives. Neither device is fast, but they aren''t THAT slow. wsvc_t and asvc_t both look fairly healthy giving the device types. Replacing the first device (took about 20 hours) went about as expected. The second took about 44 hours. The third is still running and should finish in slightly over 48 hours. I''m wondering if the following would help for the next drive: # zpool offline tank c2t4d0 # cfgadm -c unconfigure sata3/4::dsk/c2t4d0 At this point pull the drive and put it into an external USB adapter. Put the new drive in the hot-swap bay. The USB adapter shows up as c4t0d0. # zpool online tank c4t0d0 This should re-add it to the pool and resilver the last few transactions that may have been missed, right? Then I want to actually replace the drive in the zpool: # cfgadm -c configure sata3/4 # zpool replace tank c4t0d0 c2t4d0 Will this work? Will the replace go faster, since it won''t need to resilver from the parity data? $ zpool list tank NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 7.25T 6.40T 867G 88% 1.11x DEGRADED - $ zpool status -x pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Apr 23 17:03:13 2011 5.91T scanned out of 6.40T at 38.0M/s, 3h42m to go 752G resilvered, 92.43% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 replacing-5 DEGRADED 0 0 0 c2t5d0/old FAULTED 0 0 0 corrupted data c2t5d0 ONLINE 0 0 0 (resilvering) c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 errors: No known data errors $ zpool iostat -v tank 60 3 capacity operations bandwidth pool alloc free read write read write ---------------- ----- ----- ----- ----- ----- ----- tank 6.40T 867G 566 25 32.2M 156K raidz2 6.40T 867G 566 25 32.2M 156K c2t0d0 - - 362 11 5.56M 71.6K c2t1d0 - - 365 11 5.56M 71.6K c2t2d0 - - 363 11 5.56M 71.6K c2t3d0 - - 363 11 5.56M 71.6K c2t4d0 - - 361 11 5.54M 71.6K replacing - - 0 492 8.28K 4.79M c2t5d0/old - - 202 5 2.84M 36.7K c2t5d0 - - 0 315 8.66K 4.78M c2t6d0 - - 170 190 2.68M 2.69M c2t7d0 - - 386 10 5.53M 71.6K ---------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ---------------- ----- ----- ----- ----- ----- ----- tank 6.40T 867G 612 14 8.43M 70.7K raidz2 6.40T 867G 612 14 8.43M 70.7K c2t0d0 - - 411 11 1.51M 57.9K c2t1d0 - - 414 11 1.50M 58.0K c2t2d0 - - 385 11 1.51M 57.9K c2t3d0 - - 412 11 1.50M 58.0K c2t4d0 - - 412 11 1.45M 57.8K replacing - - 0 574 366 852K c2t5d0/old - - 0 0 0 0 c2t5d0 - - 0 324 366 852K c2t6d0 - - 427 11 1.45M 57.8K c2t7d0 - - 431 11 1.49M 57.9K ---------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ---------------- ----- ----- ----- ----- ----- ----- tank 6.40T 867G 1.02K 12 11.1M 69.4K raidz2 6.40T 867G 1.02K 12 11.1M 69.4K c2t0d0 - - 772 10 1.99M 59.3K c2t1d0 - - 771 10 1.99M 59.4K c2t2d0 - - 743 10 2.02M 59.4K c2t3d0 - - 771 11 2.01M 59.3K c2t4d0 - - 767 10 1.94M 59.1K replacing - - 0 1.00K 17 1.48M c2t5d0/old - - 0 0 0 0 c2t5d0 - - 0 533 17 1.48M c2t6d0 - - 791 10 1.98M 59.2K c2t7d0 - - 796 10 1.99M 59.3K ---------------- ----- ----- ----- ----- ----- ----- $ iostat -xn 60 3 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 362.4 11.5 5693.9 71.6 0.7 0.7 2.0 2.0 14 30 c2t0d0 365.3 11.5 5689.0 71.6 0.7 0.7 1.8 1.9 14 29 c2t1d0 363.2 11.5 5693.2 71.6 0.7 0.7 1.9 2.0 14 30 c2t2d0 364.0 11.5 5692.7 71.6 0.7 0.7 1.9 1.9 14 30 c2t3d0 361.2 11.5 5672.8 71.6 0.7 0.7 1.9 1.9 14 30 c2t4d0 202.4 163.1 2915.2 2475.3 0.3 1.1 0.8 2.9 7 26 c2t5d0 170.4 190.4 2747.3 2757.6 0.5 1.3 1.5 3.6 11 31 c2t6d0 386.4 11.2 5659.0 71.6 0.5 0.6 1.3 1.5 12 27 c2t7d0 95.0 1.2 94.5 16.1 0.0 0.0 0.2 0.2 0 1 c0t0d0 0.9 1.2 3.3 16.1 0.0 0.0 7.5 1.9 0 0 c0t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 514.1 13.0 1937.7 65.7 0.2 0.8 0.3 1.5 5 27 c2t0d0 510.1 13.2 1943.1 65.7 0.2 0.8 0.5 1.6 6 29 c2t1d0 513.3 13.2 1926.3 65.8 0.2 0.8 0.3 1.5 5 28 c2t2d0 505.9 13.3 1936.7 65.8 0.2 0.9 0.3 1.8 5 30 c2t3d0 513.8 12.8 1890.1 65.8 0.2 0.8 0.3 1.5 5 26 c2t4d0 0.1 488.6 0.1 1216.5 0.0 2.2 0.0 4.6 0 33 c2t5d0 533.3 12.7 1875.3 65.9 0.1 0.7 0.2 1.3 4 24 c2t6d0 541.6 12.9 1923.2 65.8 0.1 0.7 0.2 1.2 3 23 c2t7d0 0.0 2.0 0.0 9.4 0.0 0.0 1.0 0.2 0 0 c0t0d0 0.0 2.0 0.0 9.4 0.0 0.0 1.0 0.2 0 0 c0t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 506.7 9.2 1906.9 50.2 0.6 0.2 1.2 0.5 20 23 c2t0d0 509.8 9.3 1909.5 50.2 0.6 0.2 1.2 0.4 19 23 c2t1d0 508.6 9.0 1900.4 50.2 0.7 0.3 1.4 0.5 21 25 c2t2d0 506.8 9.4 1897.2 50.3 0.6 0.2 1.2 0.5 19 23 c2t3d0 505.1 9.4 1852.4 50.4 0.6 0.2 1.2 0.5 19 23 c2t4d0 0.0 487.6 0.0 1227.9 0.0 3.5 0.0 7.2 0 46 c2t5d0 534.8 9.2 1855.6 50.2 0.6 0.2 1.0 0.4 18 22 c2t6d0 540.5 9.3 1891.4 50.2 0.5 0.2 1.0 0.4 17 21 c2t7d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t1d0 -- Brandon High : bhigh at freaks.com
On Apr 25, 2011, at 2:52 PM, Brandon High wrote:> I''m in the process of replacing drive in a pool, and the resilver > times seem to have increased with each device. The way that I''m doing > this is by pulling a drive, physically replacing it, then doing > ''cfgadm -c configure ____ ; zpool replace tank ____''. I don''t have any > hot-swap bays available, so I''m physically replacing the device before > doing a ''zpool replace''. > > I''m replacing Western Digital WD10EADS 1TB drives with Hitachi 5K3000 > 3TB drives. Neither device is fast, but they aren''t THAT slow. wsvc_t > and asvc_t both look fairly healthy giving the device types.Look for 10-12 ms for asvc_t. In my experience, SATA disks tend to not handle NCQ as well as SCSI disks handle TCQ -- go figure. In your iostats below, you are obviously not bottlenecking on the disks.> > Replacing the first device (took about 20 hours) went about as > expected. The second took about 44 hours. The third is still running > and should finish in slightly over 48 hours.If there is other work going on, then you might be hitting the resilver throttle. By default, it will delay 2 clock ticks, if needed. It can be turned off temporarily using: echo zfs_resilver_delay/W0t0 | mdb -kw to return to normal: echo zfs_resilver_delay/W0t2 | mdb -kw> I''m wondering if the following would help for the next drive: > # zpool offline tank c2t4d0 > # cfgadm -c unconfigure sata3/4::dsk/c2t4d0 > > At this point pull the drive and put it into an external USB adapter. > Put the new drive in the hot-swap bay. The USB adapter shows up as > c4t0d0. > > # zpool online tank c4t0d0 > > This should re-add it to the pool and resilver the last few > transactions that may have been missed, right? > > Then I want to actually replace the drive in the zpool: > # cfgadm -c configure sata3/4 > # zpool replace tank c4t0d0 c2t4d0 > > Will this work? Will the replace go faster, since it won''t need to > resilver from the parity data?Probably won''t work because it does not make the resilvering drive any faster. -- richard> > > $ zpool list tank > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > tank 7.25T 6.40T 867G 88% 1.11x DEGRADED - > $ zpool status -x > pool: tank > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Sat Apr 23 17:03:13 2011 > 5.91T scanned out of 6.40T at 38.0M/s, 3h42m to go > 752G resilvered, 92.43% done > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz2-0 DEGRADED 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > c2t3d0 ONLINE 0 0 0 > c2t4d0 ONLINE 0 0 0 > replacing-5 DEGRADED 0 0 0 > c2t5d0/old FAULTED 0 0 0 corrupted data > c2t5d0 ONLINE 0 0 0 (resilvering) > c2t6d0 ONLINE 0 0 0 > c2t7d0 ONLINE 0 0 0 > > errors: No known data errors > $ zpool iostat -v tank 60 3 > capacity operations bandwidth > pool alloc free read write read write > ---------------- ----- ----- ----- ----- ----- ----- > tank 6.40T 867G 566 25 32.2M 156K > raidz2 6.40T 867G 566 25 32.2M 156K > c2t0d0 - - 362 11 5.56M 71.6K > c2t1d0 - - 365 11 5.56M 71.6K > c2t2d0 - - 363 11 5.56M 71.6K > c2t3d0 - - 363 11 5.56M 71.6K > c2t4d0 - - 361 11 5.54M 71.6K > replacing - - 0 492 8.28K 4.79M > c2t5d0/old - - 202 5 2.84M 36.7K > c2t5d0 - - 0 315 8.66K 4.78M > c2t6d0 - - 170 190 2.68M 2.69M > c2t7d0 - - 386 10 5.53M 71.6K > ---------------- ----- ----- ----- ----- ----- ----- > > capacity operations bandwidth > pool alloc free read write read write > ---------------- ----- ----- ----- ----- ----- ----- > tank 6.40T 867G 612 14 8.43M 70.7K > raidz2 6.40T 867G 612 14 8.43M 70.7K > c2t0d0 - - 411 11 1.51M 57.9K > c2t1d0 - - 414 11 1.50M 58.0K > c2t2d0 - - 385 11 1.51M 57.9K > c2t3d0 - - 412 11 1.50M 58.0K > c2t4d0 - - 412 11 1.45M 57.8K > replacing - - 0 574 366 852K > c2t5d0/old - - 0 0 0 0 > c2t5d0 - - 0 324 366 852K > c2t6d0 - - 427 11 1.45M 57.8K > c2t7d0 - - 431 11 1.49M 57.9K > ---------------- ----- ----- ----- ----- ----- ----- > > capacity operations bandwidth > pool alloc free read write read write > ---------------- ----- ----- ----- ----- ----- ----- > tank 6.40T 867G 1.02K 12 11.1M 69.4K > raidz2 6.40T 867G 1.02K 12 11.1M 69.4K > c2t0d0 - - 772 10 1.99M 59.3K > c2t1d0 - - 771 10 1.99M 59.4K > c2t2d0 - - 743 10 2.02M 59.4K > c2t3d0 - - 771 11 2.01M 59.3K > c2t4d0 - - 767 10 1.94M 59.1K > replacing - - 0 1.00K 17 1.48M > c2t5d0/old - - 0 0 0 0 > c2t5d0 - - 0 533 17 1.48M > c2t6d0 - - 791 10 1.98M 59.2K > c2t7d0 - - 796 10 1.99M 59.3K > ---------------- ----- ----- ----- ----- ----- ----- > > $ iostat -xn 60 3 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 362.4 11.5 5693.9 71.6 0.7 0.7 2.0 2.0 14 30 c2t0d0 > 365.3 11.5 5689.0 71.6 0.7 0.7 1.8 1.9 14 29 c2t1d0 > 363.2 11.5 5693.2 71.6 0.7 0.7 1.9 2.0 14 30 c2t2d0 > 364.0 11.5 5692.7 71.6 0.7 0.7 1.9 1.9 14 30 c2t3d0 > 361.2 11.5 5672.8 71.6 0.7 0.7 1.9 1.9 14 30 c2t4d0 > 202.4 163.1 2915.2 2475.3 0.3 1.1 0.8 2.9 7 26 c2t5d0 > 170.4 190.4 2747.3 2757.6 0.5 1.3 1.5 3.6 11 31 c2t6d0 > 386.4 11.2 5659.0 71.6 0.5 0.6 1.3 1.5 12 27 c2t7d0 > 95.0 1.2 94.5 16.1 0.0 0.0 0.2 0.2 0 1 c0t0d0 > 0.9 1.2 3.3 16.1 0.0 0.0 7.5 1.9 0 0 c0t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 514.1 13.0 1937.7 65.7 0.2 0.8 0.3 1.5 5 27 c2t0d0 > 510.1 13.2 1943.1 65.7 0.2 0.8 0.5 1.6 6 29 c2t1d0 > 513.3 13.2 1926.3 65.8 0.2 0.8 0.3 1.5 5 28 c2t2d0 > 505.9 13.3 1936.7 65.8 0.2 0.9 0.3 1.8 5 30 c2t3d0 > 513.8 12.8 1890.1 65.8 0.2 0.8 0.3 1.5 5 26 c2t4d0 > 0.1 488.6 0.1 1216.5 0.0 2.2 0.0 4.6 0 33 c2t5d0 > 533.3 12.7 1875.3 65.9 0.1 0.7 0.2 1.3 4 24 c2t6d0 > 541.6 12.9 1923.2 65.8 0.1 0.7 0.2 1.2 3 23 c2t7d0 > 0.0 2.0 0.0 9.4 0.0 0.0 1.0 0.2 0 0 c0t0d0 > 0.0 2.0 0.0 9.4 0.0 0.0 1.0 0.2 0 0 c0t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 506.7 9.2 1906.9 50.2 0.6 0.2 1.2 0.5 20 23 c2t0d0 > 509.8 9.3 1909.5 50.2 0.6 0.2 1.2 0.4 19 23 c2t1d0 > 508.6 9.0 1900.4 50.2 0.7 0.3 1.4 0.5 21 25 c2t2d0 > 506.8 9.4 1897.2 50.3 0.6 0.2 1.2 0.5 19 23 c2t3d0 > 505.1 9.4 1852.4 50.4 0.6 0.2 1.2 0.5 19 23 c2t4d0 > 0.0 487.6 0.0 1227.9 0.0 3.5 0.0 7.2 0 46 c2t5d0 > 534.8 9.2 1855.6 50.2 0.6 0.2 1.0 0.4 18 22 c2t6d0 > 540.5 9.3 1891.4 50.2 0.5 0.2 1.0 0.4 17 21 c2t7d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t1d0 > > > > -- > Brandon High : bhigh at freaks.com > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Fred Liu
2011-Apr-25 23:53 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
Cindy, Following is quoted from ZFS Dedup FAQ: "Deduplicated space accounting is reported at the pool level. You must use the zpool list command rather than the zfs list command to identify disk space consumption when dedup is enabled. If you use the zfs list command to review deduplicated space, you might see that the file system appears to be increasing because we''re able to store more data on the same physical device. Using the zpool list will show you how much physical space is being consumed and it will also show you the dedup ratio.The df command is not dedup-aware and will not provide accurate space accounting." So how can I set the quota size on a file system with dedup enabled? Thanks. Fred
On Mon, Apr 25, 2011 at 4:45 PM, Richard Elling <richard.elling at gmail.com> wrote:> If there is other work going on, then you might be hitting the resilver > throttle. By default, it will delay 2 clock ticks, if needed. It can be turnedThere is some other access to the pool from nfs and cifs clients, but not much, and mostly reads. Setting zfs_resilver_delay seems to have helped some, based on the iostat output. Are there other tunables?> Probably won''t work because it does not make the resilvering drive > any faster.It doesn''t seem like the devices are the bottleneck, even with the delay turned off. $ iostat -xn 60 3 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 369.2 11.5 5577.0 71.3 0.7 0.7 1.9 1.9 14 29 c2t0d0 371.9 11.5 5570.3 71.3 0.7 0.7 1.7 1.8 13 29 c2t1d0 369.9 11.5 5574.4 71.3 0.7 0.7 1.8 1.9 14 29 c2t2d0 370.7 11.5 5573.9 71.3 0.7 0.7 1.8 1.9 14 29 c2t3d0 368.0 11.5 5553.1 71.3 0.7 0.7 1.8 1.9 14 29 c2t4d0 196.1 172.8 2825.5 2436.6 0.3 1.1 0.8 3.0 6 26 c2t5d0 183.6 184.9 2717.6 2674.7 0.5 1.3 1.4 3.5 11 31 c2t6d0 393.0 11.2 5540.7 71.3 0.5 0.6 1.3 1.5 12 26 c2t7d0 95.8 1.2 95.6 16.2 0.0 0.0 0.2 0.2 0 1 c0t0d0 0.9 1.2 3.6 16.2 0.0 0.0 7.5 1.9 0 0 c0t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 891.2 11.8 2386.9 64.4 0.0 1.2 0.0 1.3 1 36 c2t0d0 919.9 12.1 2351.8 64.6 0.0 1.1 0.0 1.2 0 35 c2t1d0 906.9 12.1 2346.1 64.6 0.0 1.2 0.0 1.3 0 36 c2t2d0 877.9 11.6 2351.0 64.5 0.7 0.5 0.8 0.6 23 35 c2t3d0 883.4 12.0 2322.0 64.4 0.2 1.0 0.2 1.1 7 35 c2t4d0 0.8 758.0 0.8 1910.4 0.2 5.0 0.2 6.6 3 72 c2t5d0 882.7 11.4 2355.1 64.4 0.8 0.4 0.9 0.4 27 34 c2t6d0 907.8 11.4 2373.1 64.5 0.7 0.3 0.8 0.4 23 30 c2t7d0 1607.8 9.4 1568.2 83.0 0.1 0.2 0.1 0.1 3 18 c0t0d0 7.3 9.1 23.5 83.0 0.1 0.0 6.0 1.4 2 2 c0t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 960.3 12.7 2868.0 59.0 1.1 0.7 1.2 0.8 37 52 c2t0d0 963.2 12.7 2877.5 59.1 1.1 0.8 1.1 0.8 36 51 c2t1d0 960.3 12.6 2844.7 59.1 1.1 0.7 1.1 0.8 37 52 c2t2d0 1000.1 12.8 2827.1 59.0 0.6 1.2 0.6 1.2 21 52 c2t3d0 960.9 12.3 2811.1 59.0 1.3 0.6 1.3 0.6 42 51 c2t4d0 0.5 962.2 0.4 2418.3 0.0 4.1 0.0 4.3 0 59 c2t5d0 1014.2 12.3 2820.6 59.1 0.8 0.8 0.8 0.8 28 48 c2t6d0 1031.2 12.5 2822.0 59.1 0.8 0.8 0.7 0.8 26 45 c2t7d0 1836.4 0.0 1783.4 0.0 0.0 0.2 0.0 0.1 1 19 c0t0d0 5.3 0.0 5.3 0.0 0.0 0.0 1.1 1.5 1 1 c0t1d0 -- Brandon High : bhigh at freaks.com
Brandon High
2011-Apr-26 00:50 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
On Mon, Apr 25, 2011 at 4:53 PM, Fred Liu <Fred_Liu at issi.com> wrote:> So how can I set the quota size on a file system with dedup enabled?I believe the quota applies to the non-dedup''d data size. If a user stores 10G of data, it will use 10G of quota, regardless of whether it dedups at 100:1 or 1:1. -B -- Brandon High : bhigh at freaks.com
Fred Liu
2011-Apr-26 01:13 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
Hmmmm, it seems dedup is pool-based not filesystem-based. If it can have fine-grained granularity(like based on fs), that will be great! It is pity! NetApp is sweet in this aspect. Thanks. Fred> -----Original Message----- > From: Brandon High [mailto:bhigh at freaks.com] > Sent: ???, ?? 26, 2011 8:50 > To: Fred Liu > Cc: cindy.swearingen at oracle.com; ZFS discuss > Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work > with quota? > > On Mon, Apr 25, 2011 at 4:53 PM, Fred Liu <Fred_Liu at issi.com> wrote: > > So how can I set the quota size on a file system with dedup enabled? > > I believe the quota applies to the non-dedup''d data size. If a user > stores 10G of data, it will use 10G of quota, regardless of whether it > dedups at 100:1 or 1:1. > > -B > > -- > Brandon High : bhigh at freaks.com
Ian Collins
2011-Apr-26 01:23 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
On 04/26/11 01:13 PM, Fred Liu wrote:> Hmmmm, it seems dedup is pool-based not filesystem-based.That''s correct. Although it can be turned off and on at the filesystem level (assuming it is enabled for the pool).> If it can have fine-grained granularity(like based on fs), that will be great! > It is pity! NetApp is sweet in this aspect. >So what happens to user B''s quota if user B stores a ton of data that is a duplicate of user A''s data and then user A deletes the original? -- Ian.
On Mon, Apr 25, 2011 at 5:26 PM, Brandon High <bhigh at freaks.com> wrote:> Setting zfs_resilver_delay seems to have helped some, based on the > iostat output. Are there other tunables?I found zfs_resilver_min_time_ms while looking. I''ve tried bumping it up considerably, without much change. ''zpool status'' is still showing: scan: resilver in progress since Sat Apr 23 17:03:13 2011 6.06T scanned out of 6.40T at 36.0M/s, 2h46m to go 769G resilvered, 94.64% done ''iostat -xn'' shows asvc_t under 10ms still. Increasing the per-device queue depth has increased the ascv_t but hasn''t done much to effect the throughput. I''m using: echo zfs_vdev_max_pending/W0t35 | pfexec mdb -kw -B -- Brandon High : bhigh at freaks.com
Erik Trimble
2011-Apr-26 04:47 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
On 4/25/2011 6:23 PM, Ian Collins wrote:> On 04/26/11 01:13 PM, Fred Liu wrote: >> Hmmmm, it seems dedup is pool-based not filesystem-based. > That''s correct. Although it can be turned off and on at the filesystem > level (assuming it is enabled for the pool).Which is effectively the same as choosing per-filesystem dedup. Just the inverse. You turn it on at the pool level, and off at the filesystem level, which is identical to "off at the pool level, on at the filesystem level" that NetApp does.>> If it can have fine-grained granularity(like based on fs), that will be great! >> It is pity! NetApp is sweet in this aspect. >> > So what happens to user B''s quota if user B stores a ton of data that is > a duplicate of user A''s data and then user A deletes the original?Actually, right now, nothing happens to B''s quota. He''s always charged the un-deduped amount for his quota usage, whether or not dedup is enabled, and regardless of how much of his data is actually deduped. Which is as it should be, as quotas are about limiting how much a user is consuming, not how much the backend needs to store that data consumption. e.g. A, B, C, & D all have 100Mb of data in the pool, with dedup on. 20MB of storage has a dedup-factor of 3:1 (common to A, B, & C) 50MB of storage has a dedup factor of 2:1 (common to A & B ) Thus, the amount of unique data would be: A: 100 - 20 - 50 = 30MB B: 100 - 20 - 50 = 30MB C: 100 - 20 = 80MB D: 100MB Summing it all up, you would have an actual storage consumption of 70 (50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage, for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 ) A, B, C, & D would each still have a quota usage of 100MB. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
Ian Collins
2011-Apr-26 08:27 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
On 04/26/11 04:47 PM, Erik Trimble wrote:> On 4/25/2011 6:23 PM, Ian Collins wrote: >> On 04/26/11 01:13 PM, Fred Liu wrote: >>> Hmmmm, it seems dedup is pool-based not filesystem-based. >> That''s correct. Although it can be turned off and on at the filesystem >> level (assuming it is enabled for the pool). > Which is effectively the same as choosing per-filesystem dedup. Just > the inverse. You turn it on at the pool level, and off at the filesystem > level, which is identical to "off at the pool level, on at the > filesystem level" that NetApp does. > >>> If it can have fine-grained granularity(like based on fs), that will be great! >>> It is pity! NetApp is sweet in this aspect. >>> >> So what happens to user B''s quota if user B stores a ton of data that is >> a duplicate of user A''s data and then user A deletes the original? > Actually, right now, nothing happens to B''s quota. He''s always charged > the un-deduped amount for his quota usage, whether or not dedup is > enabled, and regardless of how much of his data is actually deduped. > Which is as it should be, as quotas are about limiting how much a user > is consuming, not how much the backend needs to store that data consumption.That was the point I was making: quota on deduped usage does not make sense. I was curious how he proposed doing it the other way! -- Ian.
Fred Liu
2011-Apr-26 10:59 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
> -----Original Message----- > From: Erik Trimble [mailto:erik.trimble at oracle.com] > Sent: ???, ?? 26, 2011 12:47 > To: Ian Collins > Cc: Fred Liu; ZFS discuss > Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work > with quota? > > On 4/25/2011 6:23 PM, Ian Collins wrote: > > On 04/26/11 01:13 PM, Fred Liu wrote: > >> Hmmmm, it seems dedup is pool-based not filesystem-based. > > That''s correct. Although it can be turned off and on at the > filesystem > > level (assuming it is enabled for the pool). > Which is effectively the same as choosing per-filesystem dedup. Just > the inverse. You turn it on at the pool level, and off at the > filesystem > level, which is identical to "off at the pool level, on at the > filesystem level" that NetApp does.My original though is just enabling dedup on one file system to check if it is mature enough or not in the production env. And I have only one pool. If dedup is filesytem-based, the effect of dedup will be just throttled within one file system and won''t propagate to the whole pool. Just disabling dedup cannot get rid of all the effects(such as the possible performance degrade ... etc), because the already dedup''d data is still there and DDT is still there. The thinkable thorough way is totally removing all the dedup''d data. But is it the real thorough way? And also the dedup space saving is kind of indirect. We cannot directly get the space saving in the file system where the dedup is actually enabled for it is pool-based. Even in pool perspective, it is still sort of indirect and obscure from my opinion, the real space saving is the abs delta between the output of ''zpool list'' and the sum of ''du'' on all the folders in the pool (or ''df'' on the mount point folder, not sure if the percentage like 123% will occur or not... grinning ^:^ ). But in NetApp, we can use ''df -s'' to directly and easily get the space saving.> > >> If it can have fine-grained granularity(like based on fs), that will > be great! > >> It is pity! NetApp is sweet in this aspect. > >> > > So what happens to user B''s quota if user B stores a ton of data that > is > > a duplicate of user A''s data and then user A deletes the original? > Actually, right now, nothing happens to B''s quota. He''s always charged > the un-deduped amount for his quota usage, whether or not dedup is > enabled, and regardless of how much of his data is actually deduped. > Which is as it should be, as quotas are about limiting how much a user > is consuming, not how much the backend needs to store that data > consumption. > > e.g. > > A, B, C, & D all have 100Mb of data in the pool, with dedup on. > > 20MB of storage has a dedup-factor of 3:1 (common to A, B, & C) > 50MB of storage has a dedup factor of 2:1 (common to A & B ) > > Thus, the amount of unique data would be: > > A: 100 - 20 - 50 = 30MB > B: 100 - 20 - 50 = 30MB > C: 100 - 20 = 80MB > D: 100MB > > Summing it all up, you would have an actual storage consumption of 70 > (50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage, > for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 ) > > A, B, C, & D would each still have a quota usage of 100MB.It is true, quota is in charge of logical data not physical data. Let''s assume an interesting scenario -- say the pool is 100% full in logical data (such as ''df'' tells you 100% used) but not full in physical data(such as ''zpool list'' tells you still some space available), can we continue writing data into this pool? Anybody has interests to do this experiment? ;-) Thanks. Fred
Erik Trimble
2011-Apr-26 16:07 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
On 4/26/2011 3:59 AM, Fred Liu wrote:> >> -----Original Message----- >> From: Erik Trimble [mailto:erik.trimble at oracle.com] >> Sent: ???, ?? 26, 2011 12:47 >> To: Ian Collins >> Cc: Fred Liu; ZFS discuss >> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work >> with quota? >> >> On 4/25/2011 6:23 PM, Ian Collins wrote: >>> On 04/26/11 01:13 PM, Fred Liu wrote: >>>> Hmmmm, it seems dedup is pool-based not filesystem-based. >>> That''s correct. Although it can be turned off and on at the >> filesystem >>> level (assuming it is enabled for the pool). >> Which is effectively the same as choosing per-filesystem dedup. Just >> the inverse. You turn it on at the pool level, and off at the >> filesystem >> level, which is identical to "off at the pool level, on at the >> filesystem level" that NetApp does. > My original though is just enabling dedup on one file system to check if it > is mature enough or not in the production env. And I have only one pool. > If dedup is filesytem-based, the effect of dedup will be just throttled within > one file system and won''t propagate to the whole pool. Just disabling dedup > cannot get rid of all the effects(such as the possible performance degrade ... etc), > because the already dedup''d data is still there and DDT is still there. The thinkable > thorough way is totally removing all the dedup''d data. But is it the real thorough way?You can do that now. Enable Dedup at the pool level. Turn it OFF on all the existing filesystems. Make a new "test" filesystem, and run your tests. Remember, only data written AFTER the dedup value it turned on will be de-duped. Existing data will NOT. And, though dedup is enabled at the pool level, it will only consider data written into filesystems that have the dedup value as ON. Thus, in your case, writing to the single filesystem with dedup on will NOT have ZFS check for duplicates from the other filesystems. It will check only inside itself, as it''s the only filesystem with dedup enabled. If the experiment fails, you can safely destroy your test dedup filesystem, then unset dedup at the pool level, and you''re fine.> And also the dedup space saving is kind of indirect. > We cannot directly get the space saving in the file system where the > dedup is actually enabled for it is pool-based. Even in pool perspective, > it is still sort of indirect and obscure from my opinion, the real space saving > is the abs delta between the output of ''zpool list'' and the sum of ''du'' on all the folders in the pool > (or ''df'' on the mount point folder, not sure if the percentage like 123% will occur or not... grinning ^:^ ). > > But in NetApp, we can use ''df -s'' to directly and easily get the space saving.That is true. Honestly, however, it would be hard to do this on a per-filesystem basis. ZFS allows for the creation of an arbitrary number of filesystems in a pool, far higher than NetApp does. The result is that the "filesystem" concept is much more flexible in ZFS. The downside is that keeping dedup statistics for a given arbitrary set of data is logistically difficult. An analogy with NetApp is thus: Can you use any tool to find the dedup ratio of an arbitrary directory tree INSIDE a NetApp filesystem?> It is true, quota is in charge of logical data not physical data. > Let''s assume an interesting scenario -- say the pool is 100% full in logical data > (such as ''df'' tells you 100% used) but not full in physical data(such as ''zpool list'' tells > you still some space available), can we continue writing data into this pool? >Sure, you can keep writing to the volume. What matters to the OS is what *it* thinks, not what some userland app thinks. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
Fred Liu
2011-Apr-26 16:29 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
-----Original Message----- From: Erik Trimble [mailto:erik.trimble at oracle.com] Sent: Wednesday, April 27, 2011 12:07 AM To: Fred Liu Cc: Ian Collins; ZFS discuss Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota? On 4/26/2011 3:59 AM, Fred Liu wrote:> >> -----Original Message----- >> From: Erik Trimble [mailto:erik.trimble at oracle.com] >> Sent: ???, ?? 26, 2011 12:47 >> To: Ian Collins >> Cc: Fred Liu; ZFS discuss >> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work >> with quota? >> >> On 4/25/2011 6:23 PM, Ian Collins wrote: >>> On 04/26/11 01:13 PM, Fred Liu wrote: >>>> Hmmmm, it seems dedup is pool-based not filesystem-based. >>> That''s correct. Although it can be turned off and on at the >> filesystem >>> level (assuming it is enabled for the pool). >> Which is effectively the same as choosing per-filesystem dedup. Just >> the inverse. You turn it on at the pool level, and off at the >> filesystem >> level, which is identical to "off at the pool level, on at the >> filesystem level" that NetApp does. > My original though is just enabling dedup on one file system to check if it > is mature enough or not in the production env. And I have only one pool. > If dedup is filesytem-based, the effect of dedup will be just throttled within > one file system and won''t propagate to the whole pool. Just disabling dedup > cannot get rid of all the effects(such as the possible performance degrade ... etc), > because the already dedup''d data is still there and DDT is still there. The thinkable > thorough way is totally removing all the dedup''d data. But is it the real thorough way?You can do that now. Enable Dedup at the pool level. Turn it OFF on all the existing filesystems. Make a new "test" filesystem, and run your tests. Remember, only data written AFTER the dedup value it turned on will be de-duped. Existing data will NOT. And, though dedup is enabled at the pool level, it will only consider data written into filesystems that have the dedup value as ON. Thus, in your case, writing to the single filesystem with dedup on will NOT have ZFS check for duplicates from the other filesystems. It will check only inside itself, as it''s the only filesystem with dedup enabled. If the experiment fails, you can safely destroy your test dedup filesystem, then unset dedup at the pool level, and you''re fine. Thanks. I will have a try.> And also the dedup space saving is kind of indirect. > We cannot directly get the space saving in the file system where the > dedup is actually enabled for it is pool-based. Even in pool perspective, > it is still sort of indirect and obscure from my opinion, the real space saving > is the abs delta between the output of ''zpool list'' and the sum of ''du'' on all the folders in the pool > (or ''df'' on the mount point folder, not sure if the percentage like 123% will occur or not... grinning ^:^ ). > > But in NetApp, we can use ''df -s'' to directly and easily get the space saving.That is true. Honestly, however, it would be hard to do this on a per-filesystem basis. ZFS allows for the creation of an arbitrary number of filesystems in a pool, far higher than NetApp does. The result is that the "filesystem" concept is much more flexible in ZFS. The downside is that keeping dedup statistics for a given arbitrary set of data is logistically difficult. An analogy with NetApp is thus: Can you use any tool to find the dedup ratio of an arbitrary directory tree INSIDE a NetApp filesystem? That is true. There is no apple-to-apple corresponding terminology in NetApp for file system in ZFS. If we think ''volume'' in NetApp is the opponent for ''file system'' in ZFS, then that is doable, because dedup in NetApp is volume-based.> It is true, quota is in charge of logical data not physical data. > Let''s assume an interesting scenario -- say the pool is 100% full in logical data > (such as ''df'' tells you 100% used) but not full in physical data(such as ''zpool list'' tells > you still some space available), can we continue writing data into this pool? >Sure, you can keep writing to the volume. What matters to the OS is what *it* thinks, not what some userland app thinks. OK. And then what the output of ''df'' will be? Thanks. Fred
Erik Trimble
2011-Apr-26 17:05 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
On 4/26/2011 9:29 AM, Fred Liu wrote:> From: Erik Trimble [mailto:erik.trimble at oracle.com] >> It is true, quota is in charge of logical data not physical data. >> Let''s assume an interesting scenario -- say the pool is 100% full in logical data >> (such as ''df'' tells you 100% used) but not full in physical data(such as ''zpool list'' tells >> you still some space available), can we continue writing data into this pool? >> > Sure, you can keep writing to the volume. What matters to the OS is what > *it* thinks, not what some userland app thinks. > > OK. And then what the output of ''df'' will be? > > Thanks. > > Fred110% full. Or whatever. df will just keep reporting what it sees. Even if what it *thinks* doesn''t make sense to the human reading it. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
The last resilver finished after 50 hours. Ouch. I''m onto the next device now, which seems to be progressing much, much better. The current tunings that I''m using right now are: echo zfs_resilver_delay/W0t0 | mdb -kw echo zfs_resilver_min_time_ms/W0t20000 | pfexec mdb -kw Things could slow down, but at 13 hours in, the resilver has been managing ~ 100M/s and is 70% done. -B -- Brandon High : bhigh at freaks.com
Fred Liu
2011-Apr-26 22:31 UTC
[zfs-discuss] How does ZFS dedup space accounting work with quota?
-----Original Message----- From: Erik Trimble [mailto:erik.trimble at oracle.com] Sent: Wednesday, April 27, 2011 1:06 AM To: Fred Liu Cc: Ian Collins; ZFS discuss Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota? On 4/26/2011 9:29 AM, Fred Liu wrote:> From: Erik Trimble [mailto:erik.trimble at oracle.com] >> It is true, quota is in charge of logical data not physical data. >> Let''s assume an interesting scenario -- say the pool is 100% full in logical data >> (such as ''df'' tells you 100% used) but not full in physical data(such as ''zpool list'' tells >> you still some space available), can we continue writing data into this pool? >> > Sure, you can keep writing to the volume. What matters to the OS is what > *it* thinks, not what some userland app thinks. > > OK. And then what the output of ''df'' will be? > > Thanks. > > Fred110% full. Or whatever. df will just keep reporting what it sees. Even if what it *thinks* doesn''t make sense to the human reading it. Gotcha! Thanks. Fred