thr3ads.net - zfs discuss - [zfs-discuss] Drive replacement speed [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Brandon High

2011-Apr-25 21:52 UTC

[zfs-discuss] Drive replacement speed

I''m in the process of replacing drive in a pool, and the resilver
times seem to have increased with each device. The way that I''m doing
this is by pulling a drive, physically replacing it, then doing
''cfgadm -c configure ____ ; zpool replace tank ____''. I
don''t have any
hot-swap bays available, so I''m physically replacing the device before
doing a ''zpool replace''.

I''m replacing Western Digital WD10EADS 1TB drives with Hitachi 5K3000
3TB drives. Neither device is fast, but they aren''t THAT slow. wsvc_t
and asvc_t both look fairly healthy giving the device types.

Replacing the first device (took about 20 hours) went about as
expected. The second took about 44 hours. The third is still running
and should finish in slightly over 48 hours.

I''m wondering if the following would help for the next drive:
# zpool offline tank c2t4d0
# cfgadm -c unconfigure sata3/4::dsk/c2t4d0

At this point pull the drive and put it into an external USB adapter.
Put the new drive in the hot-swap bay. The USB adapter shows up as
c4t0d0.

# zpool online tank c4t0d0

This should re-add it to the pool and resilver the last few
transactions that may have been missed, right?

Then I want to actually replace the drive in the zpool:
# cfgadm -c configure sata3/4
# zpool replace tank c4t0d0 c2t4d0

Will this work? Will the replace go faster, since it won''t need to
resilver from the parity data?


$ zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  7.25T  6.40T   867G    88%  1.11x  DEGRADED  -
$ zpool status -x
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Sat Apr 23 17:03:13 2011
    5.91T scanned out of 6.40T at 38.0M/s, 3h42m to go
    752G resilvered, 92.43% done
config:

        NAME              STATE     READ WRITE CKSUM
        tank              DEGRADED     0     0     0
          raidz2-0        DEGRADED     0     0     0
            c2t0d0        ONLINE       0     0     0
            c2t1d0        ONLINE       0     0     0
            c2t2d0        ONLINE       0     0     0
            c2t3d0        ONLINE       0     0     0
            c2t4d0        ONLINE       0     0     0
            replacing-5   DEGRADED     0     0     0
              c2t5d0/old  FAULTED      0     0     0  corrupted data
              c2t5d0      ONLINE       0     0     0  (resilvering)
            c2t6d0        ONLINE       0     0     0
            c2t7d0        ONLINE       0     0     0

errors: No known data errors
$ zpool iostat -v tank 60 3
                     capacity     operations    bandwidth
pool              alloc   free   read  write   read  write
----------------  -----  -----  -----  -----  -----  -----
tank              6.40T   867G    566     25  32.2M   156K
  raidz2          6.40T   867G    566     25  32.2M   156K
    c2t0d0            -      -    362     11  5.56M  71.6K
    c2t1d0            -      -    365     11  5.56M  71.6K
    c2t2d0            -      -    363     11  5.56M  71.6K
    c2t3d0            -      -    363     11  5.56M  71.6K
    c2t4d0            -      -    361     11  5.54M  71.6K
    replacing         -      -      0    492  8.28K  4.79M
      c2t5d0/old      -      -    202      5  2.84M  36.7K
      c2t5d0          -      -      0    315  8.66K  4.78M
    c2t6d0            -      -    170    190  2.68M  2.69M
    c2t7d0            -      -    386     10  5.53M  71.6K
----------------  -----  -----  -----  -----  -----  -----

                     capacity     operations    bandwidth
pool              alloc   free   read  write   read  write
----------------  -----  -----  -----  -----  -----  -----
tank              6.40T   867G    612     14  8.43M  70.7K
  raidz2          6.40T   867G    612     14  8.43M  70.7K
    c2t0d0            -      -    411     11  1.51M  57.9K
    c2t1d0            -      -    414     11  1.50M  58.0K
    c2t2d0            -      -    385     11  1.51M  57.9K
    c2t3d0            -      -    412     11  1.50M  58.0K
    c2t4d0            -      -    412     11  1.45M  57.8K
    replacing         -      -      0    574    366   852K
      c2t5d0/old      -      -      0      0      0      0
      c2t5d0          -      -      0    324    366   852K
    c2t6d0            -      -    427     11  1.45M  57.8K
    c2t7d0            -      -    431     11  1.49M  57.9K
----------------  -----  -----  -----  -----  -----  -----

                     capacity     operations    bandwidth
pool              alloc   free   read  write   read  write
----------------  -----  -----  -----  -----  -----  -----
tank              6.40T   867G  1.02K     12  11.1M  69.4K
  raidz2          6.40T   867G  1.02K     12  11.1M  69.4K
    c2t0d0            -      -    772     10  1.99M  59.3K
    c2t1d0            -      -    771     10  1.99M  59.4K
    c2t2d0            -      -    743     10  2.02M  59.4K
    c2t3d0            -      -    771     11  2.01M  59.3K
    c2t4d0            -      -    767     10  1.94M  59.1K
    replacing         -      -      0  1.00K     17  1.48M
      c2t5d0/old      -      -      0      0      0      0
      c2t5d0          -      -      0    533     17  1.48M
    c2t6d0            -      -    791     10  1.98M  59.2K
    c2t7d0            -      -    796     10  1.99M  59.3K
----------------  -----  -----  -----  -----  -----  -----

$ iostat -xn 60 3
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  362.4   11.5 5693.9   71.6  0.7  0.7    2.0    2.0  14  30 c2t0d0
  365.3   11.5 5689.0   71.6  0.7  0.7    1.8    1.9  14  29 c2t1d0
  363.2   11.5 5693.2   71.6  0.7  0.7    1.9    2.0  14  30 c2t2d0
  364.0   11.5 5692.7   71.6  0.7  0.7    1.9    1.9  14  30 c2t3d0
  361.2   11.5 5672.8   71.6  0.7  0.7    1.9    1.9  14  30 c2t4d0
  202.4  163.1 2915.2 2475.3  0.3  1.1    0.8    2.9   7  26 c2t5d0
  170.4  190.4 2747.3 2757.6  0.5  1.3    1.5    3.6  11  31 c2t6d0
  386.4   11.2 5659.0   71.6  0.5  0.6    1.3    1.5  12  27 c2t7d0
   95.0    1.2   94.5   16.1  0.0  0.0    0.2    0.2   0   1 c0t0d0
    0.9    1.2    3.3   16.1  0.0  0.0    7.5    1.9   0   0 c0t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  514.1   13.0 1937.7   65.7  0.2  0.8    0.3    1.5   5  27 c2t0d0
  510.1   13.2 1943.1   65.7  0.2  0.8    0.5    1.6   6  29 c2t1d0
  513.3   13.2 1926.3   65.8  0.2  0.8    0.3    1.5   5  28 c2t2d0
  505.9   13.3 1936.7   65.8  0.2  0.9    0.3    1.8   5  30 c2t3d0
  513.8   12.8 1890.1   65.8  0.2  0.8    0.3    1.5   5  26 c2t4d0
    0.1  488.6    0.1 1216.5  0.0  2.2    0.0    4.6   0  33 c2t5d0
  533.3   12.7 1875.3   65.9  0.1  0.7    0.2    1.3   4  24 c2t6d0
  541.6   12.9 1923.2   65.8  0.1  0.7    0.2    1.2   3  23 c2t7d0
    0.0    2.0    0.0    9.4  0.0  0.0    1.0    0.2   0   0 c0t0d0
    0.0    2.0    0.0    9.4  0.0  0.0    1.0    0.2   0   0 c0t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  506.7    9.2 1906.9   50.2  0.6  0.2    1.2    0.5  20  23 c2t0d0
  509.8    9.3 1909.5   50.2  0.6  0.2    1.2    0.4  19  23 c2t1d0
  508.6    9.0 1900.4   50.2  0.7  0.3    1.4    0.5  21  25 c2t2d0
  506.8    9.4 1897.2   50.3  0.6  0.2    1.2    0.5  19  23 c2t3d0
  505.1    9.4 1852.4   50.4  0.6  0.2    1.2    0.5  19  23 c2t4d0
    0.0  487.6    0.0 1227.9  0.0  3.5    0.0    7.2   0  46 c2t5d0
  534.8    9.2 1855.6   50.2  0.6  0.2    1.0    0.4  18  22 c2t6d0
  540.5    9.3 1891.4   50.2  0.5  0.2    1.0    0.4  17  21 c2t7d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t1d0



-- 
Brandon High : bhigh at freaks.com

Richard Elling

2011-Apr-25 23:45 UTC

head link

[zfs-discuss] Drive replacement speed

On Apr 25, 2011, at 2:52 PM, Brandon High wrote:
> I''m in the process of replacing drive in a pool, and the resilver
> times seem to have increased with each device. The way that I''m
doing
> this is by pulling a drive, physically replacing it, then doing
> ''cfgadm -c configure ____ ; zpool replace tank ____''. I
don''t have any
> hot-swap bays available, so I''m physically replacing the device
before
> doing a ''zpool replace''.
> 
> I''m replacing Western Digital WD10EADS 1TB drives with Hitachi
5K3000
> 3TB drives. Neither device is fast, but they aren''t THAT slow.
wsvc_t
> and asvc_t both look fairly healthy giving the device types.
Look for 10-12 ms for asvc_t.  In my experience, SATA disks tend to not 
handle NCQ as well as SCSI disks handle TCQ -- go figure. In your iostats
below, you are obviously not bottlenecking on the disks.
> 
> Replacing the first device (took about 20 hours) went about as
> expected. The second took about 44 hours. The third is still running
> and should finish in slightly over 48 hours.
If there is other work going on, then you might be hitting the resilver
throttle. By default, it will delay 2 clock ticks, if needed. It can be turned 
off temporarily using:
	echo zfs_resilver_delay/W0t0 | mdb -kw

to return to normal:
	echo zfs_resilver_delay/W0t2 | mdb -kw
> I''m wondering if the following would help for the next drive:
> # zpool offline tank c2t4d0
> # cfgadm -c unconfigure sata3/4::dsk/c2t4d0
> 
> At this point pull the drive and put it into an external USB adapter.
> Put the new drive in the hot-swap bay. The USB adapter shows up as
> c4t0d0.
> 
> # zpool online tank c4t0d0
> 
> This should re-add it to the pool and resilver the last few
> transactions that may have been missed, right?
> 
> Then I want to actually replace the drive in the zpool:
> # cfgadm -c configure sata3/4
> # zpool replace tank c4t0d0 c2t4d0
> 
> Will this work? Will the replace go faster, since it won''t need to
> resilver from the parity data?
Probably won''t work because it does not make the resilvering drive
any faster.
 -- richard
> 
> 
> $ zpool list tank
> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> tank  7.25T  6.40T   867G    88%  1.11x  DEGRADED  -
> $ zpool status -x
>  pool: tank
> state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>        continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scan: resilver in progress since Sat Apr 23 17:03:13 2011
>    5.91T scanned out of 6.40T at 38.0M/s, 3h42m to go
>    752G resilvered, 92.43% done
> config:
> 
>        NAME              STATE     READ WRITE CKSUM
>        tank              DEGRADED     0     0     0
>          raidz2-0        DEGRADED     0     0     0
>            c2t0d0        ONLINE       0     0     0
>            c2t1d0        ONLINE       0     0     0
>            c2t2d0        ONLINE       0     0     0
>            c2t3d0        ONLINE       0     0     0
>            c2t4d0        ONLINE       0     0     0
>            replacing-5   DEGRADED     0     0     0
>              c2t5d0/old  FAULTED      0     0     0  corrupted data
>              c2t5d0      ONLINE       0     0     0  (resilvering)
>            c2t6d0        ONLINE       0     0     0
>            c2t7d0        ONLINE       0     0     0
> 
> errors: No known data errors
> $ zpool iostat -v tank 60 3
>                     capacity     operations    bandwidth
> pool              alloc   free   read  write   read  write
> ----------------  -----  -----  -----  -----  -----  -----
> tank              6.40T   867G    566     25  32.2M   156K
>  raidz2          6.40T   867G    566     25  32.2M   156K
>    c2t0d0            -      -    362     11  5.56M  71.6K
>    c2t1d0            -      -    365     11  5.56M  71.6K
>    c2t2d0            -      -    363     11  5.56M  71.6K
>    c2t3d0            -      -    363     11  5.56M  71.6K
>    c2t4d0            -      -    361     11  5.54M  71.6K
>    replacing         -      -      0    492  8.28K  4.79M
>      c2t5d0/old      -      -    202      5  2.84M  36.7K
>      c2t5d0          -      -      0    315  8.66K  4.78M
>    c2t6d0            -      -    170    190  2.68M  2.69M
>    c2t7d0            -      -    386     10  5.53M  71.6K
> ----------------  -----  -----  -----  -----  -----  -----
> 
>                     capacity     operations    bandwidth
> pool              alloc   free   read  write   read  write
> ----------------  -----  -----  -----  -----  -----  -----
> tank              6.40T   867G    612     14  8.43M  70.7K
>  raidz2          6.40T   867G    612     14  8.43M  70.7K
>    c2t0d0            -      -    411     11  1.51M  57.9K
>    c2t1d0            -      -    414     11  1.50M  58.0K
>    c2t2d0            -      -    385     11  1.51M  57.9K
>    c2t3d0            -      -    412     11  1.50M  58.0K
>    c2t4d0            -      -    412     11  1.45M  57.8K
>    replacing         -      -      0    574    366   852K
>      c2t5d0/old      -      -      0      0      0      0
>      c2t5d0          -      -      0    324    366   852K
>    c2t6d0            -      -    427     11  1.45M  57.8K
>    c2t7d0            -      -    431     11  1.49M  57.9K
> ----------------  -----  -----  -----  -----  -----  -----
> 
>                     capacity     operations    bandwidth
> pool              alloc   free   read  write   read  write
> ----------------  -----  -----  -----  -----  -----  -----
> tank              6.40T   867G  1.02K     12  11.1M  69.4K
>  raidz2          6.40T   867G  1.02K     12  11.1M  69.4K
>    c2t0d0            -      -    772     10  1.99M  59.3K
>    c2t1d0            -      -    771     10  1.99M  59.4K
>    c2t2d0            -      -    743     10  2.02M  59.4K
>    c2t3d0            -      -    771     11  2.01M  59.3K
>    c2t4d0            -      -    767     10  1.94M  59.1K
>    replacing         -      -      0  1.00K     17  1.48M
>      c2t5d0/old      -      -      0      0      0      0
>      c2t5d0          -      -      0    533     17  1.48M
>    c2t6d0            -      -    791     10  1.98M  59.2K
>    c2t7d0            -      -    796     10  1.99M  59.3K
> ----------------  -----  -----  -----  -----  -----  -----
> 
> $ iostat -xn 60 3
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  362.4   11.5 5693.9   71.6  0.7  0.7    2.0    2.0  14  30 c2t0d0
>  365.3   11.5 5689.0   71.6  0.7  0.7    1.8    1.9  14  29 c2t1d0
>  363.2   11.5 5693.2   71.6  0.7  0.7    1.9    2.0  14  30 c2t2d0
>  364.0   11.5 5692.7   71.6  0.7  0.7    1.9    1.9  14  30 c2t3d0
>  361.2   11.5 5672.8   71.6  0.7  0.7    1.9    1.9  14  30 c2t4d0
>  202.4  163.1 2915.2 2475.3  0.3  1.1    0.8    2.9   7  26 c2t5d0
>  170.4  190.4 2747.3 2757.6  0.5  1.3    1.5    3.6  11  31 c2t6d0
>  386.4   11.2 5659.0   71.6  0.5  0.6    1.3    1.5  12  27 c2t7d0
>   95.0    1.2   94.5   16.1  0.0  0.0    0.2    0.2   0   1 c0t0d0
>    0.9    1.2    3.3   16.1  0.0  0.0    7.5    1.9   0   0 c0t1d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  514.1   13.0 1937.7   65.7  0.2  0.8    0.3    1.5   5  27 c2t0d0
>  510.1   13.2 1943.1   65.7  0.2  0.8    0.5    1.6   6  29 c2t1d0
>  513.3   13.2 1926.3   65.8  0.2  0.8    0.3    1.5   5  28 c2t2d0
>  505.9   13.3 1936.7   65.8  0.2  0.9    0.3    1.8   5  30 c2t3d0
>  513.8   12.8 1890.1   65.8  0.2  0.8    0.3    1.5   5  26 c2t4d0
>    0.1  488.6    0.1 1216.5  0.0  2.2    0.0    4.6   0  33 c2t5d0
>  533.3   12.7 1875.3   65.9  0.1  0.7    0.2    1.3   4  24 c2t6d0
>  541.6   12.9 1923.2   65.8  0.1  0.7    0.2    1.2   3  23 c2t7d0
>    0.0    2.0    0.0    9.4  0.0  0.0    1.0    0.2   0   0 c0t0d0
>    0.0    2.0    0.0    9.4  0.0  0.0    1.0    0.2   0   0 c0t1d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  506.7    9.2 1906.9   50.2  0.6  0.2    1.2    0.5  20  23 c2t0d0
>  509.8    9.3 1909.5   50.2  0.6  0.2    1.2    0.4  19  23 c2t1d0
>  508.6    9.0 1900.4   50.2  0.7  0.3    1.4    0.5  21  25 c2t2d0
>  506.8    9.4 1897.2   50.3  0.6  0.2    1.2    0.5  19  23 c2t3d0
>  505.1    9.4 1852.4   50.4  0.6  0.2    1.2    0.5  19  23 c2t4d0
>    0.0  487.6    0.0 1227.9  0.0  3.5    0.0    7.2   0  46 c2t5d0
>  534.8    9.2 1855.6   50.2  0.6  0.2    1.0    0.4  18  22 c2t6d0
>  540.5    9.3 1891.4   50.2  0.5  0.2    1.0    0.4  17  21 c2t7d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t1d0
> 
> 
> 
> -- 
> Brandon High : bhigh at freaks.com
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Fred Liu

2011-Apr-25 23:53 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

Cindy,

Following is quoted from ZFS Dedup FAQ:

"Deduplicated space accounting is reported at the pool level. You must use
the zpool list command rather than the zfs list command to identify disk space
consumption when dedup is enabled. If you use the zfs list command to review
deduplicated space, you might see that the file system appears to be increasing
because we''re able to store more data on the same physical device.
Using the zpool list will show you how much physical space is being consumed and
it will also show you the dedup ratio.The df command is not dedup-aware and will
not provide accurate space accounting."

So how can I set the quota size on a file system with dedup enabled?

Thanks.

Fred

Brandon High

2011-Apr-26 00:26 UTC

head link

[zfs-discuss] Drive replacement speed

On Mon, Apr 25, 2011 at 4:45 PM, Richard Elling
<richard.elling at gmail.com> wrote:> If there is other work going on, then you might be hitting the resilver
> throttle. By default, it will delay 2 clock ticks, if needed. It can be
turned
There is some other access to the pool from nfs and cifs clients, but
not much, and mostly reads.

Setting zfs_resilver_delay seems to have helped some, based on the
iostat output. Are there other tunables?
> Probably won''t work because it does not make the resilvering drive
> any faster.
It doesn''t seem like the devices are the bottleneck, even with the
delay turned off.

$ iostat -xn 60 3
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  369.2   11.5 5577.0   71.3  0.7  0.7    1.9    1.9  14  29 c2t0d0
  371.9   11.5 5570.3   71.3  0.7  0.7    1.7    1.8  13  29 c2t1d0
  369.9   11.5 5574.4   71.3  0.7  0.7    1.8    1.9  14  29 c2t2d0
  370.7   11.5 5573.9   71.3  0.7  0.7    1.8    1.9  14  29 c2t3d0
  368.0   11.5 5553.1   71.3  0.7  0.7    1.8    1.9  14  29 c2t4d0
  196.1  172.8 2825.5 2436.6  0.3  1.1    0.8    3.0   6  26 c2t5d0
  183.6  184.9 2717.6 2674.7  0.5  1.3    1.4    3.5  11  31 c2t6d0
  393.0   11.2 5540.7   71.3  0.5  0.6    1.3    1.5  12  26 c2t7d0
   95.8    1.2   95.6   16.2  0.0  0.0    0.2    0.2   0   1 c0t0d0
    0.9    1.2    3.6   16.2  0.0  0.0    7.5    1.9   0   0 c0t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  891.2   11.8 2386.9   64.4  0.0  1.2    0.0    1.3   1  36 c2t0d0
  919.9   12.1 2351.8   64.6  0.0  1.1    0.0    1.2   0  35 c2t1d0
  906.9   12.1 2346.1   64.6  0.0  1.2    0.0    1.3   0  36 c2t2d0
  877.9   11.6 2351.0   64.5  0.7  0.5    0.8    0.6  23  35 c2t3d0
  883.4   12.0 2322.0   64.4  0.2  1.0    0.2    1.1   7  35 c2t4d0
    0.8  758.0    0.8 1910.4  0.2  5.0    0.2    6.6   3  72 c2t5d0
  882.7   11.4 2355.1   64.4  0.8  0.4    0.9    0.4  27  34 c2t6d0
  907.8   11.4 2373.1   64.5  0.7  0.3    0.8    0.4  23  30 c2t7d0
 1607.8    9.4 1568.2   83.0  0.1  0.2    0.1    0.1   3  18 c0t0d0
    7.3    9.1   23.5   83.0  0.1  0.0    6.0    1.4   2   2 c0t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  960.3   12.7 2868.0   59.0  1.1  0.7    1.2    0.8  37  52 c2t0d0
  963.2   12.7 2877.5   59.1  1.1  0.8    1.1    0.8  36  51 c2t1d0
  960.3   12.6 2844.7   59.1  1.1  0.7    1.1    0.8  37  52 c2t2d0
 1000.1   12.8 2827.1   59.0  0.6  1.2    0.6    1.2  21  52 c2t3d0
  960.9   12.3 2811.1   59.0  1.3  0.6    1.3    0.6  42  51 c2t4d0
    0.5  962.2    0.4 2418.3  0.0  4.1    0.0    4.3   0  59 c2t5d0
 1014.2   12.3 2820.6   59.1  0.8  0.8    0.8    0.8  28  48 c2t6d0
 1031.2   12.5 2822.0   59.1  0.8  0.8    0.7    0.8  26  45 c2t7d0
 1836.4    0.0 1783.4    0.0  0.0  0.2    0.0    0.1   1  19 c0t0d0
    5.3    0.0    5.3    0.0  0.0  0.0    1.1    1.5   1   1 c0t1d0


-- 
Brandon High : bhigh at freaks.com

Brandon High

2011-Apr-26 00:50 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

On Mon, Apr 25, 2011 at 4:53 PM, Fred Liu <Fred_Liu at issi.com>
wrote:> So how can I set the quota size on a file system with dedup enabled?
I believe the quota applies to the non-dedup''d data size. If a user
stores 10G of data, it will use 10G of quota, regardless of whether it
dedups at 100:1 or 1:1.

-B

-- 
Brandon High : bhigh at freaks.com

Fred Liu

2011-Apr-26 01:13 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

Hmmmm, it seems dedup is pool-based not filesystem-based.
If it can have fine-grained granularity(like based on fs), that will be great!
It is pity! NetApp is sweet in this aspect.

Thanks.

Fred 
> -----Original Message-----
> From: Brandon High [mailto:bhigh at freaks.com]
> Sent: ???, ?? 26, 2011 8:50
> To: Fred Liu
> Cc: cindy.swearingen at oracle.com; ZFS discuss
> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
> with quota?
> 
> On Mon, Apr 25, 2011 at 4:53 PM, Fred Liu <Fred_Liu at issi.com>
wrote:
> > So how can I set the quota size on a file system with dedup enabled?
> 
> I believe the quota applies to the non-dedup''d data size. If a
user
> stores 10G of data, it will use 10G of quota, regardless of whether it
> dedups at 100:1 or 1:1.
> 
> -B
> 
> --
> Brandon High : bhigh at freaks.com

Ian Collins

2011-Apr-26 01:23 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

On 04/26/11 01:13 PM, Fred Liu wrote:> Hmmmm, it seems dedup is pool-based not filesystem-based.
That''s correct. Although it can be turned off and on at the filesystem
level (assuming it is enabled for the pool).
> If it can have fine-grained granularity(like based on fs), that will be
great!
> It is pity! NetApp is sweet in this aspect.
>So what happens to user B''s quota if user B stores a ton of data that
is
a duplicate of user A''s data and then user A deletes the original?

-- 
Ian.

Brandon High

2011-Apr-26 01:32 UTC

head link

[zfs-discuss] Drive replacement speed

On Mon, Apr 25, 2011 at 5:26 PM, Brandon High <bhigh at freaks.com>
wrote:> Setting zfs_resilver_delay seems to have helped some, based on the
> iostat output. Are there other tunables?
I found zfs_resilver_min_time_ms while looking. I''ve tried bumping it
up considerably, without much change.

''zpool status'' is still showing:
 scan: resilver in progress since Sat Apr 23 17:03:13 2011
    6.06T scanned out of 6.40T at 36.0M/s, 2h46m to go
    769G resilvered, 94.64% done

''iostat -xn'' shows asvc_t under 10ms still.

Increasing the per-device queue depth has increased the ascv_t but
hasn''t done much to effect the throughput. I''m using:
echo zfs_vdev_max_pending/W0t35 | pfexec mdb -kw

-B

-- 
Brandon High : bhigh at freaks.com

Erik Trimble

2011-Apr-26 04:47 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/25/2011 6:23 PM, Ian Collins wrote:>   On 04/26/11 01:13 PM, Fred Liu wrote:
>> Hmmmm, it seems dedup is pool-based not filesystem-based.
> That''s correct. Although it can be turned off and on at the
filesystem
> level (assuming it is enabled for the pool).Which is effectively the same as choosing per-filesystem dedup.  Just 
the inverse. You turn it on at the pool level, and off at the filesystem 
level, which is identical to "off at the pool level, on at the 
filesystem level" that NetApp does.
>> If it can have fine-grained granularity(like based on fs), that will be
great!
>> It is pity! NetApp is sweet in this aspect.
>>
> So what happens to user B''s quota if user B stores a ton of data
that is
> a duplicate of user A''s data and then user A deletes the original?Actually, right now, nothing happens to B''s quota. He''s always
charged
the un-deduped amount for his quota usage, whether or not dedup is 
enabled, and regardless of how much of his data is actually deduped. 
Which is as it should be, as quotas are about limiting how much a user 
is consuming, not how much the backend needs to store that data consumption.

e.g.

A, B, C, & D all have 100Mb of data in the pool, with dedup on.

20MB of storage has a dedup-factor of 3:1 (common to A, B, & C)
50MB of storage has a dedup factor of 2:1 (common to A & B )

Thus, the amount of unique data would be:

A: 100 - 20 - 50 = 30MB
B: 100 - 20 - 50 = 30MB
C: 100 - 20 = 80MB
D: 100MB

Summing it all up, you would have an actual storage consumption of  70 
(50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage, 
for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 )

A, B, C, & D would each still have a quota usage of 100MB.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Ian Collins

2011-Apr-26 08:27 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

On 04/26/11 04:47 PM, Erik Trimble wrote:> On 4/25/2011 6:23 PM, Ian Collins wrote:
>>    On 04/26/11 01:13 PM, Fred Liu wrote:
>>> Hmmmm, it seems dedup is pool-based not filesystem-based.
>> That''s correct. Although it can be turned off and on at the
filesystem
>> level (assuming it is enabled for the pool).
> Which is effectively the same as choosing per-filesystem dedup.  Just
> the inverse. You turn it on at the pool level, and off at the filesystem
> level, which is identical to "off at the pool level, on at the
> filesystem level" that NetApp does.
>
>>> If it can have fine-grained granularity(like based on fs), that
will be great!
>>> It is pity! NetApp is sweet in this aspect.
>>>
>> So what happens to user B''s quota if user B stores a ton of
data that is
>> a duplicate of user A''s data and then user A deletes the
original?
> Actually, right now, nothing happens to B''s quota. He''s
always charged
> the un-deduped amount for his quota usage, whether or not dedup is
> enabled, and regardless of how much of his data is actually deduped.
> Which is as it should be, as quotas are about limiting how much a user
> is consuming, not how much the backend needs to store that data
consumption.
That was the point I was making: quota on deduped usage does not make sense.

I was curious how he proposed doing it the other way!

-- 
Ian.

Fred Liu

2011-Apr-26 10:59 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

> -----Original Message-----
> From: Erik Trimble [mailto:erik.trimble at oracle.com]
> Sent: ???, ?? 26, 2011 12:47
> To: Ian Collins
> Cc: Fred Liu; ZFS discuss
> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
> with quota?
> 
> On 4/25/2011 6:23 PM, Ian Collins wrote:
> >   On 04/26/11 01:13 PM, Fred Liu wrote:
> >> Hmmmm, it seems dedup is pool-based not filesystem-based.
> > That''s correct. Although it can be turned off and on at the
> filesystem
> > level (assuming it is enabled for the pool).
> Which is effectively the same as choosing per-filesystem dedup.  Just
> the inverse. You turn it on at the pool level, and off at the
> filesystem
> level, which is identical to "off at the pool level, on at the
> filesystem level" that NetApp does.
My original though is just enabling dedup on one file system to check if it
is mature enough or not in the production env. And I have only one pool.
If dedup is filesytem-based, the effect of dedup will be just throttled within
one file system and won''t propagate to the whole pool. Just disabling
dedup
cannot get rid of all the effects(such as the possible performance degrade ...
etc),
because the already dedup''d data is still there and DDT is still there.
The thinkable
thorough way is totally removing all the dedup''d data. But is it the
real thorough way?

And also the dedup space saving is kind of indirect. 
We cannot directly get the space saving in the file system where the 
dedup is actually enabled for it is pool-based. Even in pool perspective,
it is still sort of indirect and obscure from my opinion, the real space saving
is the abs delta between the output of ''zpool list'' and the
sum of ''du'' on all the folders in the pool
(or ''df'' on the mount point folder, not sure if the percentage
like 123% will occur or not... grinning ^:^ ).

But in NetApp, we can use ''df -s'' to directly and easily get
the space saving.
> 
> >> If it can have fine-grained granularity(like based on fs), that
will
> be great!
> >> It is pity! NetApp is sweet in this aspect.
> >>
> > So what happens to user B''s quota if user B stores a ton of
data that
> is
> > a duplicate of user A''s data and then user A deletes the
original?
> Actually, right now, nothing happens to B''s quota. He''s
always charged
> the un-deduped amount for his quota usage, whether or not dedup is
> enabled, and regardless of how much of his data is actually deduped.
> Which is as it should be, as quotas are about limiting how much a user
> is consuming, not how much the backend needs to store that data
> consumption.
> 
> e.g.
> 
> A, B, C, & D all have 100Mb of data in the pool, with dedup on.
> 
> 20MB of storage has a dedup-factor of 3:1 (common to A, B, & C)
> 50MB of storage has a dedup factor of 2:1 (common to A & B )
> 
> Thus, the amount of unique data would be:
> 
> A: 100 - 20 - 50 = 30MB
> B: 100 - 20 - 50 = 30MB
> C: 100 - 20 = 80MB
> D: 100MB
> 
> Summing it all up, you would have an actual storage consumption of  70
> (50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage,
> for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 )
> 
> A, B, C, & D would each still have a quota usage of 100MB.

It is true, quota is in charge of logical data not physical data.
Let''s assume an interesting scenario -- say the pool is 100% full in
logical data
(such as ''df'' tells you 100% used) but not full in physical
data(such as ''zpool list'' tells
you still some space available), can we continue writing data into this pool?

Anybody has interests to do this experiment? ;-)

Thanks.

Fred

Erik Trimble

2011-Apr-26 16:07 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 3:59 AM, Fred Liu wrote:>
>> -----Original Message-----
>> From: Erik Trimble [mailto:erik.trimble at oracle.com]
>> Sent: ???, ?? 26, 2011 12:47
>> To: Ian Collins
>> Cc: Fred Liu; ZFS discuss
>> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
>> with quota?
>>
>> On 4/25/2011 6:23 PM, Ian Collins wrote:
>>>   On 04/26/11 01:13 PM, Fred Liu wrote:
>>>> Hmmmm, it seems dedup is pool-based not filesystem-based.
>>> That''s correct. Although it can be turned off and on at
the
>> filesystem
>>> level (assuming it is enabled for the pool).
>> Which is effectively the same as choosing per-filesystem dedup.  Just
>> the inverse. You turn it on at the pool level, and off at the
>> filesystem
>> level, which is identical to "off at the pool level, on at the
>> filesystem level" that NetApp does.
> My original though is just enabling dedup on one file system to check if it
> is mature enough or not in the production env. And I have only one pool.
> If dedup is filesytem-based, the effect of dedup will be just throttled
within
> one file system and won''t propagate to the whole pool. Just
disabling dedup
> cannot get rid of all the effects(such as the possible performance degrade
... etc),
> because the already dedup''d data is still there and DDT is still
there. The thinkable
> thorough way is totally removing all the dedup''d data. But is it
the real thorough way?You can do that now. Enable Dedup at the pool level. Turn it OFF on all
the existing filesystems. Make a new "test" filesystem, and run your
tests.

Remember, only data written AFTER the dedup value it turned on will be
de-duped. Existing data will NOT. And, though dedup is enabled at the
pool level, it will only consider data written into filesystems that
have the dedup value as ON.

Thus, in your case, writing to the single filesystem with dedup on will
NOT have ZFS check for duplicates from the other filesystems. It will
check only inside itself, as it''s the only filesystem with dedup
enabled.

If the experiment fails, you can safely destroy your test dedup
filesystem, then unset dedup at the pool level, and you''re fine.

> And also the dedup space saving is kind of indirect. 
> We cannot directly get the space saving in the file system where the 
> dedup is actually enabled for it is pool-based. Even in pool perspective,
> it is still sort of indirect and obscure from my opinion, the real space
saving
> is the abs delta between the output of ''zpool list'' and
the sum of ''du'' on all the folders in the pool
> (or ''df'' on the mount point folder, not sure if the
percentage like 123% will occur or not... grinning ^:^ ).
>
> But in NetApp, we can use ''df -s'' to directly and easily
get the space saving.That is true. Honestly, however, it would be hard to do this on a
per-filesystem basis. ZFS allows for the creation of an arbitrary number
of filesystems in a pool, far higher than NetApp does. The result is
that the "filesystem" concept is much more flexible in ZFS. The
downside
is that keeping dedup statistics for a given arbitrary set of data is
logistically difficult.

An analogy with NetApp is thus: Can you use any tool to find the dedup
ratio of an arbitrary directory tree INSIDE a NetApp filesystem?

> It is true, quota is in charge of logical data not physical data.
> Let''s assume an interesting scenario -- say the pool is 100% full
in logical data
> (such as ''df'' tells you 100% used) but not full in
physical data(such as ''zpool list'' tells
> you still some space available), can we continue writing data into this
pool?
>Sure, you can keep writing to the volume. What matters to the OS is what
*it* thinks, not what some userland app thinks.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Fred Liu

2011-Apr-26 16:29 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

-----Original Message-----
From: Erik Trimble [mailto:erik.trimble at oracle.com] 
Sent: Wednesday, April 27, 2011 12:07 AM
To: Fred Liu
Cc: Ian Collins; ZFS discuss
Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 3:59 AM, Fred Liu wrote:>
>> -----Original Message-----
>> From: Erik Trimble [mailto:erik.trimble at oracle.com]
>> Sent: ???, ?? 26, 2011 12:47
>> To: Ian Collins
>> Cc: Fred Liu; ZFS discuss
>> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
>> with quota?
>>
>> On 4/25/2011 6:23 PM, Ian Collins wrote:
>>>   On 04/26/11 01:13 PM, Fred Liu wrote:
>>>> Hmmmm, it seems dedup is pool-based not filesystem-based.
>>> That''s correct. Although it can be turned off and on at
the
>> filesystem
>>> level (assuming it is enabled for the pool).
>> Which is effectively the same as choosing per-filesystem dedup.  Just
>> the inverse. You turn it on at the pool level, and off at the
>> filesystem
>> level, which is identical to "off at the pool level, on at the
>> filesystem level" that NetApp does.
> My original though is just enabling dedup on one file system to check if it
> is mature enough or not in the production env. And I have only one pool.
> If dedup is filesytem-based, the effect of dedup will be just throttled
within
> one file system and won''t propagate to the whole pool. Just
disabling dedup
> cannot get rid of all the effects(such as the possible performance degrade
... etc),
> because the already dedup''d data is still there and DDT is still
there. The thinkable
> thorough way is totally removing all the dedup''d data. But is it
the real thorough way?You can do that now. Enable Dedup at the pool level. Turn it OFF on all
the existing filesystems. Make a new "test" filesystem, and run your
tests.

Remember, only data written AFTER the dedup value it turned on will be
de-duped. Existing data will NOT. And, though dedup is enabled at the
pool level, it will only consider data written into filesystems that
have the dedup value as ON.

Thus, in your case, writing to the single filesystem with dedup on will
NOT have ZFS check for duplicates from the other filesystems. It will
check only inside itself, as it''s the only filesystem with dedup
enabled.

If the experiment fails, you can safely destroy your test dedup
filesystem, then unset dedup at the pool level, and you''re fine.


Thanks. I will have a try.

> And also the dedup space saving is kind of indirect. 
> We cannot directly get the space saving in the file system where the 
> dedup is actually enabled for it is pool-based. Even in pool perspective,
> it is still sort of indirect and obscure from my opinion, the real space
saving
> is the abs delta between the output of ''zpool list'' and
the sum of ''du'' on all the folders in the pool
> (or ''df'' on the mount point folder, not sure if the
percentage like 123% will occur or not... grinning ^:^ ).
>
> But in NetApp, we can use ''df -s'' to directly and easily
get the space saving.That is true. Honestly, however, it would be hard to do this on a
per-filesystem basis. ZFS allows for the creation of an arbitrary number
of filesystems in a pool, far higher than NetApp does. The result is
that the "filesystem" concept is much more flexible in ZFS. The
downside
is that keeping dedup statistics for a given arbitrary set of data is
logistically difficult.

An analogy with NetApp is thus: Can you use any tool to find the dedup
ratio of an arbitrary directory tree INSIDE a NetApp filesystem?

That is true. There is no apple-to-apple corresponding terminology in NetApp for
file system in ZFS.
If we think ''volume'' in NetApp is the opponent for
''file system'' in ZFS, then that is doable, because
dedup in NetApp is volume-based.
> It is true, quota is in charge of logical data not physical data.
> Let''s assume an interesting scenario -- say the pool is 100% full
in logical data
> (such as ''df'' tells you 100% used) but not full in
physical data(such as ''zpool list'' tells
> you still some space available), can we continue writing data into this
pool?
>Sure, you can keep writing to the volume. What matters to the OS is what
*it* thinks, not what some userland app thinks.

OK. And then what the output of ''df'' will be?

Thanks.

Fred

Erik Trimble

2011-Apr-26 17:05 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 9:29 AM, Fred Liu wrote:> From: Erik Trimble [mailto:erik.trimble at oracle.com] 
>> It is true, quota is in charge of logical data not physical data.
>> Let''s assume an interesting scenario -- say the pool is 100%
full in logical data
>> (such as ''df'' tells you 100% used) but not full in
physical data(such as ''zpool list'' tells
>> you still some space available), can we continue writing data into this
pool?
>>
> Sure, you can keep writing to the volume. What matters to the OS is what
> *it* thinks, not what some userland app thinks.
>
> OK. And then what the output of ''df'' will be?
>
> Thanks.
>
> Fred110% full. Or whatever. df will just keep reporting what it sees. Even
if what it *thinks* doesn''t make sense to the human reading it.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Brandon High

2011-Apr-26 20:41 UTC

head link

[zfs-discuss] Drive replacement speed

The last resilver finished after 50 hours. Ouch.

I''m onto the next device now, which seems to be progressing much, much
better.

The current tunings that I''m using right now are:
echo zfs_resilver_delay/W0t0 | mdb -kw
echo zfs_resilver_min_time_ms/W0t20000 | pfexec mdb -kw

Things could slow down, but at 13 hours in, the resilver has been
managing ~ 100M/s and is 70% done.

-B

-- 
Brandon High : bhigh at freaks.com

Fred Liu

2011-Apr-26 22:31 UTC

head link

[zfs-discuss] How does ZFS dedup space accounting work with quota?

-----Original Message-----
From: Erik Trimble [mailto:erik.trimble at oracle.com] 
Sent: Wednesday, April 27, 2011 1:06 AM
To: Fred Liu
Cc: Ian Collins; ZFS discuss
Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 9:29 AM, Fred Liu wrote:> From: Erik Trimble [mailto:erik.trimble at oracle.com] 
>> It is true, quota is in charge of logical data not physical data.
>> Let''s assume an interesting scenario -- say the pool is 100%
full in logical data
>> (such as ''df'' tells you 100% used) but not full in
physical data(such as ''zpool list'' tells
>> you still some space available), can we continue writing data into this
pool?
>>
> Sure, you can keep writing to the volume. What matters to the OS is what
> *it* thinks, not what some userland app thinks.
>
> OK. And then what the output of ''df'' will be?
>
> Thanks.
>
> Fred110% full. Or whatever. df will just keep reporting what it sees. Even
if what it *thinks* doesn''t make sense to the human reading it.

Gotcha!

Thanks.

Fred

zfs discuss - Apr 2011 - Drive replacement speed

[zfs-discuss] Drive replacement speed

[zfs-discuss] Drive replacement speed

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] Drive replacement speed

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] Drive replacement speed

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] How does ZFS dedup space accounting work with quota?

[zfs-discuss] Drive replacement speed

[zfs-discuss] How does ZFS dedup space accounting work with quota?