thr3ads.net - zfs discuss - [zfs-discuss] performance leakage when copy huge data [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Fei Xu

2010-Sep-09 01:14 UTC

[zfs-discuss] performance leakage when copy huge data

Hi all:

    I''m a new guy who is just started ZFS for half a year.  We are
using Nexenta in corporate pilot environment.  these days, when I was trying to
move around 4TB data from an old pool(4*2TB raidz) to new pool (11*2TB raidz2),
it seems will never end up successfully.
1. I used CP first.  the initial speed is great @ 250MB/s.  but it will suddenly
become slow at 400KB/s when around 400-600GB data is copied.  I''ve
tried several times.
2. Use Rsync.  Speed is crab, around 15MB/s.  it will end up good but
performance seems dropped even after Rsync finished.
3. use ZFS send|recv.  Speed is close to CP.  but still suddenly go down to
xxxKB/s after around 1.5TB migrated.
4. I''ve tried CP to remote server which has 1Gb nic connection.  It can
be finished but I can see a significant system slow down.

what''s the root cause of that?
is it because the memory is too low?  all my ZFS system has 12GB memory. 
I''m considering adding at least 12GB more to each of them.

thanks
fei
-- 
This message posted from opensolaris.org

Ian Collins

2010-Sep-09 01:26 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On 09/ 9/10 01:14 PM, Fei Xu wrote:> Hi all:
>
>      I''m a new guy who is just started ZFS for half a year.  We
are using Nexenta in corporate pilot environment.  these days, when I was trying
to move around 4TB data from an old pool(4*2TB raidz) to new pool (11*2TB
raidz2), it seems will never end up successfully.
> 1. I used CP first.  the initial speed is great @ 250MB/s.  but it will
suddenly become slow at 400KB/s when around 400-600GB data is copied. 
I''ve tried several times.
>    11*2TB raidz2 is a bad idea.  The stripe is too wide, so performance will
suffer.

> 2. Use Rsync.  Speed is crab, around 15MB/s.  it will end up good but
performance seems dropped even after Rsync finished.
> 3. use ZFS send|recv.  Speed is close to CP.  but still suddenly go down to
xxxKB/s after around 1.5TB migrated.
> 4. I''ve tried CP to remote server which has 1Gb nic connection. 
It can be finished but I can see a significant system slow down.
>
> what''s the root cause of that?
> is it because the memory is too low?  all my ZFS system has 12GB memory. 
I''m considering adding at least 12GB more to each of them.
>
>    Run zpool iostat -v on the receiver when things slow down and post the 
result.

I moved (send/receive) 2TB of data across a LAN last week and there 
wasn''t any variation in speed.

-- 
Ian.

Fei Xu

2010-Sep-09 02:15 UTC

head link

[zfs-discuss] performance leakage when copy huge data

thank you Ian.  I''ve re-build the pool to 9*2TB Raidz2 and start the
ZFS send command.  result will come out after about 3 hours.

thanks
fei
-- 
This message posted from opensolaris.org

Fei Xu

2010-Sep-09 02:42 UTC

head link

[zfs-discuss] performance leakage when copy huge data

now it gets extremly slow at around 400G sent.  

first iostat result is captured when the send operation starts.

                capacity     operations    bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
sh001a       37.6G  16.2T      0  1.17K     82   146M
  raidz2     37.6G  16.2T      0  1.17K     82   146M
    c0t10d0      -      -      0    201    974  21.0M
    c0t11d0      -      -      0    201    974  21.1M
    c0t23d0      -      -      0    201  1.56K  21.0M
    c0t24d0      -      -      0    201  1.26K  21.0M
    c0t25d0      -      -      0    201    662  21.1M
    c0t26d0      -      -      0    201  1.26K  21.1M
    c0t2d0       -      -      0    202    974  21.1M
    c0t5d0       -      -      0    200    662  20.9M
    c0t6d0       -      -      0    200  1.26K  21.0M
-----------  -----  -----  -----  -----  -----  -----
syspool      10.6G   137G     11     13   668K   137K
  c3d0s0     10.6G   137G     11     13   668K   137K
-----------  -----  -----  -----  -----  -----  -----
vol1         5.40T  1.85T    621      5  76.9M  12.4K
  raidz1     5.40T  1.85T    621      5  76.9M  12.4K
    c0t22d0      -      -    280      3  19.5M  14.2K
    c0t3d0       -      -    279      3  19.5M  13.9K
    c0t20d0      -      -    280      3  19.5M  14.2K
    c0t21d0      -      -    280      3  19.5M  13.9K
-----------  -----  -----  -----  -----  -----  -----

-----------------------------------------------------------------------------------------------
below result is when ZFS send stuck @ 397G.  Seems the HD I/O is quite normal. 
then, where is the data... notice that, IOstat command response very slow.

                capacity     operations    bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
sh001a        397G  15.9T      0  1.08K    490   136M
  raidz2      397G  15.9T      0  1.08K    490   136M
    c0t10d0      -      -      0    185  1.68K  19.4M
    c0t11d0      -      -      0    185  1.71K  19.4M
    c0t23d0      -      -      0    185  1.99K  19.4M
    c0t24d0      -      -      0    185  1.79K  19.4M
    c0t25d0      -      -      0    185  2.10K  19.4M
    c0t26d0      -      -      0    185  2.07K  19.4M
    c0t2d0       -      -      0    185  1.99K  19.4M
    c0t5d0       -      -      0    185  2.12K  19.4M
    c0t6d0       -      -      0    185  2.23K  19.4M
-----------  -----  -----  -----  -----  -----  -----
syspool      10.6G   137G      2      6   131K  48.0K
  c3d0s0     10.6G   137G      2      6   131K  48.0K
-----------  -----  -----  -----  -----  -----  -----
vol1         5.40T  1.85T   1009      1   125M  2.85K
  raidz1     5.40T  1.85T   1009      1   125M  2.85K
    c0t22d0      -      -    453      0  31.6M  2.64K
    c0t3d0       -      -    452      0  31.6M  2.58K
    c0t20d0      -      -    453      0  31.6M  2.64K
    c0t21d0      -      -    453      0  31.6M  2.56K
-----------  -----  -----  -----  -----  -----  -----
-- 
This message posted from opensolaris.org

Ian Collins

2010-Sep-09 02:46 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On 09/ 9/10 02:42 PM, Fei Xu wrote:> now it gets extremly slow at around 400G sent.
>
> first iostat result is captured when the send operation starts.
>
>                  capacity     operations    bandwidth
> pool         alloc   free   read  write   read  write
> -----------  -----  -----  -----  -----  -----  -----
> sh001a       37.6G  16.2T      0  1.17K     82   146M
>    raidz2     37.6G  16.2T      0  1.17K     82   146M
>    
<snip>>
-----------------------------------------------------------------------------------------------
> below result is when ZFS send stuck @ 397G.  Seems the HD I/O is quite
normal.  then, where is the data... notice that, IOstat command response very
slow.
>
>                  capacity     operations    bandwidth
> pool         alloc   free   read  write   read  write
> -----------  -----  -----  -----  -----  -----  -----
> sh001a        397G  15.9T      0  1.08K    490   136M
>    raidz2      397G  15.9T      0  1.08K    490   136M
>    
Have you get dedup enabled?  Note the read bandwith is much higher.

-- 
Ian.

Fei Xu

2010-Sep-09 03:53 UTC

head link

[zfs-discuss] performance leakage when copy huge data

>    
> ve you get dedup enabled?  Note the read bandwith is
> much higher.
> 
> -- 
> Ian.
> 
no, dedup is not enabled since it''s still not stable enough even for
test environment.
here is a JPG of Read/Write indicator.  RED line is read and GREEN line is
write.
you can see, because destination pool has more disks, more throughput ability,
so, most of the time, it''s just waiting for source pool reading.

http://bbs.manmi.com/UploadFile/2010-9/20109911304144848.jpg
-- 
This message posted from opensolaris.org

Fei Xu

2010-Sep-09 04:59 UTC

head link

[zfs-discuss] performance leakage when copy huge data

I dig deeper into it and might find some useful information.
I attached an X25 SSD for ZIL to see if it helps.  but no luck.
I run IOstate -xnz for more details and got interesting result as below.(maybe
too long)
some explaination:
1. c2d0 is SSD for ZIL
2. c0t3d0, c0t20d0, c0t21d0, c0t22d0 is source pool.
3. you can see the first result is the same as zpool iostate -v, and
I''ve ran -xnz several times, the first result were always good.
4. then, 30s later comes up the second one, and then 3rd, 4th.  you can see my
source pool disks are extremely slow and busy.  One by One.  asvc_t is high, %b
is high.  that''s abnormal.
5. I think it might because those 4 drives are the old WD green disks with
"defect" of disk head park/unpark and TLER.  other 12 2TB disks are
the latest one which already fixed.
6. Let me try to test on 1TB disk and see if any issue.



root at sh1slcs001:/volumes# iostat -xnz 30
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.4    0.0    3.0    0.3  0.0  0.0    0.0    0.2   0   0 c2d0
    1.4    8.9   57.8   40.3  0.0  0.1    2.5    5.8   1   5 c3d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0  986.5   0   0 fd0
    2.8  154.8  141.5 16517.7  0.0  2.4    0.1   15.0   2  25 c0t2d0
  408.4    0.3 29140.4    0.9  0.0  4.1    0.1   10.1   2  55 c0t3d0
    2.8  154.9  136.7 16519.6  0.0  2.4    0.1   14.9   2  25 c0t5d0
    2.8  155.0  139.4 16520.9  0.0  2.4    0.1   15.0   2  26 c0t6d0
    0.0    0.0    0.5    0.0  0.0  0.0    0.0    0.1   0   0 c0t7d0
    0.0    0.0    0.5    0.0  0.0  0.0    0.0    3.4   0   0 c0t8d0
    0.0    0.0    0.4    0.0  0.0  0.0    0.0    0.1   0   0 c0t9d0
    2.8  154.9  139.2 16523.9  0.0  2.4    0.1   15.2   2  26 c0t10d0
    2.8  155.0  136.1 16525.4  0.0  2.5    0.1   15.7   2  26 c0t11d0
  409.9    0.3 29152.1    0.9  0.0  3.9    0.1    9.4   2  52 c0t20d0
  409.5    0.3 29152.0    0.9  0.0  3.9    0.1    9.6   2  52 c0t21d0
  409.2    0.3 29155.7    0.9  0.0  4.1    0.1   10.0   2  55 c0t22d0
    2.8  155.0  138.6 16527.6  0.0  2.4    0.1   15.2   2  26 c0t23d0
    2.9  154.9  138.0 16528.8  0.0  2.4    0.1   15.1   2  26 c0t24d0
    2.9  155.0  136.1 16530.2  0.0  2.5    0.1   15.8   2  27 c0t25d0
    2.8  155.1  139.7 16531.6  0.0  2.4    0.1   15.5   2  26 c0t26d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.3    0.0    1.2    0.0  0.0  0.0    0.0    0.1   0   0 c2d0
    0.1   17.7    0.1   51.7  0.0  0.1    0.2    4.1   0   7 c3d0
    0.1    2.1    0.0   79.8  0.0  0.0    0.1    4.0   0   0 c0t2d0
    0.2    0.0    7.1    0.0  0.1  2.3  278.5 11365.1   1  46 c0t3d0
    0.1    2.2    0.0   79.9  0.0  0.0    0.1    3.7   0   0 c0t5d0
    0.1    2.3    0.0   80.0  0.0  0.0    0.1    9.2   0   0 c0t6d0
    0.1    2.5    0.0   80.1  0.0  0.0    0.1    3.8   0   0 c0t10d0
    0.1    2.4    0.0   80.0  0.0  0.0    0.1    9.5   0   0 c0t11d0
    1.9    0.0  133.0    0.0  0.1  2.8   60.2 1520.6   2  51 c0t20d0
    1.6    0.0  110.9    0.0  0.0  0.0    0.0    7.6   0   0 c0t21d0
    1.6    0.0  109.5    0.0  0.0  0.0    0.0   11.5   0   0 c0t22d0
    0.1    2.4    0.0   80.0  0.0  0.0    0.1    8.6   0   0 c0t23d0
    0.1    2.6    0.0   80.0  0.0  0.0    0.1    2.9   0   0 c0t24d0
    0.1    2.4    0.0   79.9  0.0  0.0    0.1    6.2   0   0 c0t25d0
    0.1    2.3    0.0   79.9  0.0  0.0    0.1    7.8   0   0 c0t26d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.1    0.0    0.6    0.0  0.0  0.0    0.0    0.1   0   0 c2d0
    0.0   21.3    0.0   82.3  0.0  0.1    0.4    4.0   0   8 c3d0
    0.0    2.1    0.0   68.2  0.0  0.0    0.2    3.8   0   0 c0t2d0
    0.7    0.0   39.1    0.0  0.0  0.6   64.0  884.1   1  10 c0t3d0
    0.0    2.2    0.0   68.2  0.0  0.0    0.1    3.5   0   0 c0t5d0
    0.0    2.3    0.0   68.3  0.0  0.0    0.1    3.5   0   0 c0t6d0
    0.0    2.3    0.0   68.3  0.0  0.0    0.1    3.2   0   0 c0t10d0
    0.0    2.2    0.0   68.3  0.0  0.0    0.1    3.7   0   0 c0t11d0
    2.0    0.0  133.7    0.0  0.0  0.0    0.0    9.4   0   0 c0t20d0
    2.1    0.0  135.8    0.0  0.1  5.2   67.8 2498.1   3  88 c0t21d0
    2.8    0.0  134.4    0.0  0.0  0.0    0.0    3.2   0   0 c0t22d0
    0.0    2.1    0.0   68.2  0.0  0.0    0.1    3.6   0   0 c0t23d0
    0.0    2.4    0.0   68.4  0.0  0.0    0.1    3.1   0   0 c0t24d0
    0.0    2.4    0.0   68.4  0.0  0.0    0.1    3.4   0   0 c0t25d0
    0.0    2.2    0.0   68.3  0.0  0.0    0.1    3.3   0   0 c0t26d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.1    0.0    0.6    0.0  0.0  0.0    0.0    0.1   0   0 c2d0
    0.1   16.5    0.1   44.5  0.0  0.1    0.2    4.8   0   7 c3d0
    0.0    2.3    0.0   73.8  0.0  0.0    0.1    3.8   0   0 c0t2d0
    3.5    0.0  246.8    0.0  0.0  0.8    6.3  229.8   1  20 c0t3d0
    0.0    2.0    0.0   73.7  0.0  0.1    0.1   48.3   0   2 c0t5d0
    0.0    2.2    0.0   73.8  0.0  0.0    0.1    5.4   0   0 c0t6d0
    0.0    2.4    0.0   74.0  0.0  0.0    0.1    5.4   0   0 c0t10d0
    0.0    2.5    0.0   74.0  0.0  0.0    0.1    4.7   0   0 c0t11d0
    3.1    0.0  136.6    0.0  0.0  0.0    0.0    1.4   0   0 c0t20d0
    0.7    0.0   29.2    0.0  0.0  0.6    0.0  911.0   0  12 c0t21d0
    1.9    0.0  138.7    0.0  0.1  4.7   73.0 2428.6   2  66 c0t22d0
    0.0    2.2    0.0   73.9  0.0  0.0    0.1    5.4   0   0 c0t23d0
    0.0    2.3    0.0   73.9  0.0  0.1    0.1   27.7   0   2 c0t24d0
    0.0    2.4    0.0   74.0  0.0  0.0    0.1    4.7   0   0 c0t25d0
    0.0    2.3    0.0   73.9  0.0  0.0    0.1    7.3   0   0 c0t26d0
-- 
This message posted from opensolaris.org

Tomas Ögren

2010-Sep-09 09:57 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On 08 September, 2010 - Fei Xu sent me these 5,9K bytes:
> I dig deeper into it and might find some useful information.
> I attached an X25 SSD for ZIL to see if it helps.  but no luck.
> I run IOstate -xnz for more details and got interesting result as
below.(maybe too long)
> some explaination:
> 1. c2d0 is SSD for ZIL
> 2. c0t3d0, c0t20d0, c0t21d0, c0t22d0 is source pool.
...>                     extended device statistics              
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.3    0.0    1.2    0.0  0.0  0.0    0.0    0.1   0   0 c2d0
>     0.1   17.7    0.1   51.7  0.0  0.1    0.2    4.1   0   7 c3d0
>     0.1    2.1    0.0   79.8  0.0  0.0    0.1    4.0   0   0 c0t2d0
>     0.2    0.0    7.1    0.0  0.1  2.3  278.5 11365.1   1  46 c0t3d0
Service time here is crap. 11 seconds to reply.
>     0.1    2.2    0.0   79.9  0.0  0.0    0.1    3.7   0   0 c0t5d0
>     0.1    2.3    0.0   80.0  0.0  0.0    0.1    9.2   0   0 c0t6d0
>     0.1    2.5    0.0   80.1  0.0  0.0    0.1    3.8   0   0 c0t10d0
>     0.1    2.4    0.0   80.0  0.0  0.0    0.1    9.5   0   0 c0t11d0
>     1.9    0.0  133.0    0.0  0.1  2.8   60.2 1520.6   2  51 c0t20d0
1.5 seconds to reply. crap.
>                     extended device statistics              
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
...>     0.7    0.0   39.1    0.0  0.0  0.6   64.0  884.1   1  10 c0t3d0
...>     2.1    0.0  135.8    0.0  0.1  5.2   67.8 2498.1   3  88 c0t21d0
...>                     extended device statistics              
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
...>     3.5    0.0  246.8    0.0  0.0  0.8    6.3  229.8   1  20 c0t3d0
...>     0.7    0.0   29.2    0.0  0.0  0.6    0.0  911.0   0  12 c0t21d0
>     1.9    0.0  138.7    0.0  0.1  4.7   73.0 2428.6   2  66 c0t22d0...

Service times here are crap. Disks are malfunctioning in some way. If
your source disks can take seconds (or 10+ seconds) to reply, then of
course your copy will be slow. Disk is probably having a hard time
reading the data or something.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Fei Xu

2010-Sep-09 12:27 UTC

head link

[zfs-discuss] performance leakage when copy huge data

> 
> Service times here are crap. Disks are malfunctioning
> in some way. If
> your source disks can take seconds (or 10+ seconds)
> to reply, then of
> course your copy will be slow. Disk is probably
> having a hard time
> reading the data or something.
> 

Yeah, that should not go over 15ms.  I just cannot understand why it starts ok
with hundred GB files transfered and then suddenly fall to "sleep".
by the way,  WDIDLE time is already disabled which might cause some issue. 
I''ve changed to another system to test ZFS send between 8*1TB pool and
4*1TB pool.  hope everythings OK on this case.
-- 
This message posted from opensolaris.org

Ross Walker

2010-Sep-09 13:39 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On Sep 9, 2010, at 8:27 AM, Fei Xu <twinsenxu at hotmail.com> wrote:
>> 
>> Service times here are crap. Disks are malfunctioning
>> in some way. If
>> your source disks can take seconds (or 10+ seconds)
>> to reply, then of
>> course your copy will be slow. Disk is probably
>> having a hard time
>> reading the data or something.
>> 
> 
> 
> Yeah, that should not go over 15ms.  I just cannot understand why it starts
ok with hundred GB files transfered and then suddenly fall to "sleep".
> by the way,  WDIDLE time is already disabled which might cause some issue. 
I''ve changed to another system to test ZFS send between 8*1TB pool and
4*1TB pool.  hope everythings OK on this case.
This might be the dreaded WD TLER issue. Basically the drive keeps retrying a
read operation over and over after a bit error trying to recover from a read
error themselves. With ZFS one really needs to disable this and have the drives
fail immediately.

Check your drives to see if they have this feature, if so think about replacing
the drives in the source pool that have long service times and make sure this
feature is disabled on the destination pool drives.

-Ross

Markus Kovero

2010-Sep-09 14:05 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On Sep 9, 2010, at 8:27 AM, Fei Xu <twinsenxu at hotmail.com> wrote:

> This might be the dreaded WD TLER issue. Basically the drive keeps retrying
a read operation over and over after a bit error trying to recover from a >
read error themselves. With ZFS one really needs to disable this and have the
drives fail immediately.
> Check your drives to see if they have this feature, if so think about
replacing the drives in the source pool that have long service times and make
sure this feature is disabled on the destination pool drives.
> -Ross

It might be due tler-issues, but I''d try to pin greens down to
SATA1-mode (use jumper, or force via controller). It might help a bit with these
disks, although these are not really suitable disks for any use in any raid
configurations due tler issue, which cannot be disabled in later firmware
versions.

Yours
Markus Kovero

Mark Little

2010-Sep-09 15:05 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On Thu, 9 Sep 2010 14:05:51 +0000, Markus Kovero
<Markus.Kovero at nebula.fi> wrote:> On Sep 9, 2010, at 8:27 AM, Fei Xu <twinsenxu at hotmail.com> wrote:
> 
> 
>> This might be the dreaded WD TLER issue. Basically the drive keeps
retrying a read operation over and over after a bit error trying to recover from
a > read error themselves. With ZFS one really needs to disable this and have
the drives fail immediately.
> 
>> Check your drives to see if they have this feature, if so think about
replacing the drives in the source pool that have long service times and make
sure this feature is disabled on the destination pool drives.
> 
>> -Ross
> 
> 
> It might be due tler-issues, but I''d try to pin greens down to
> SATA1-mode (use jumper, or force via controller). It might help a bit
> with these disks, although these are not really suitable disks for any
> use in any raid configurations due tler issue, which cannot be
> disabled in later firmware versions.
> 
> Yours
> Markus Kovero
> 
Just to clarify - do you mean TLER should be off or on?  TLER = Time
Limited Error Recovery so the drive only takes a max time (eg: 7
seconds) to retrieve data or returns an error.  So you say ''cannot be
disabled'' but I think you mean ''cannot be ENABLED'' ?

I''ve been doing a lot of research for a new storage box at work, and
from reading a lot of the info available in the Storage forum on
hardforum.com, the experts there seem to recommend NOT having TLER
enabled when using ZFS as ZFS can be configured for its timeouts, etc,
and the main reason to use TLER is when using those drives with hardware
RAID cards which will kick a drive out of the array if it takes longer
than 10 seconds.

Can anyone else here comment if they have had experience with the WD
drives and ZFS and if they have TLER enabled or disabled?

Cheers,
Mark

Miles Nordin

2010-Sep-09 18:37 UTC

head link

[zfs-discuss] performance leakage when copy huge data

>>>>> "ml" == Mark Little <marklittle at
koallo.com> writes:
ml> Just to clarify - do you mean TLER should be off or on?

It should be set to ``do not have asvc_t 11 seconds and <1
io/s''''.

...which is not one of the settings of the TLER knob.

This isn''t a problem with the TLER *setting*. TLER does not even
apply unless the drive has a latent sector error.

TLER does not even apply unless the drive has a latent sector error.

GOT IT? so if the drive is not defective, but is erratically having
huge latency when not busy, this isn''t a TLER problem. It''s a
drive-is-unpredictable-piece-of-junk problem. Will the problem go
away if you change the TLER setting to the opposite of whatever it is?
Who knows?! It shouldn''t based on the claimed purpose of TLER, but in
reality, maybe, maybe not, because the drive shouldn''t
(``shouldn''t'''',
haha) act like that to begin with. It will be more likely to go away
if you replace the drive with a different model, though.

ml> Storage forum on hardforum.com, the experts there seem to
ml> recommend NOT having TLER enabled when using ZFS as ZFS can be
ml> configured for its timeouts, etc,

I don''t believe there are any configurable timeouts in ZFS. The ZFS
developers take the position that timeouts are not our problem and
push all that work down the stack to the controller driver and the
disk driver, which cooperate (this is two drivers, now. plus a third
``SCSI mid-layer'''' perhaps, for some controllers but not
others.) to
implement a variety of inconsistent, silly, undocumented cargo-cult
flailing timeout regimes that we all have to put up with. However
they are always quite long. The ATA max timeout is 30sec, and AIUI
they are all much longer than that.

My new favorite thing, though, is the reference counting. OS: ``This
disk/iSCSIdisk is `busy'' so you can''t detach
it''''. me: ``bullshit.
YOINK, detached, now deal with it.'''' IMO this area is in need
of some
serious bar-raising.

ml> and the main reason to use TLER is when using those drives
ml> with hardware RAID cards which will kick a drive out of the
ml> array if it takes longer than 10 seconds.

yup.

which is something the drive will not do unless it encounters an
ERROR. that is the E in TLER. In other words, the feature as
described prevents you from noticing and invoking warranty replacement
on your about-to-fail drive. For this you pay double. Have I got
that right?

In any case the obvious proper place to fix this is in the
RAID-on-a-card firmware, not the disk firmware, if it does even need
fixing which is unclear to me. unless the disk manufacturers are
going to offer a feature ``do not spend more than 1 second out of
every 2 seconds `trying harder'' to read marginal data, just return
errors'''' which woudl actually have real value, the only reason
TLER is
proper is that it can convince all you gamers to pay twice as much for
a drive because they''ve flipped a single bit in the firmware and then
shovelled a big pile of bullshit into your heads.

ml> Can anyone else here comment if they have had experience with
ml> the WD drives and ZFS and if they have TLER enabled or
ml> disabled?

I do not have any problems with drives dropping out of ZFS using the
normal TLER setting.

I do have problems with slowly-failing drives fucking up the whole
system. ZFS doesn''t deal with them gracefully, and I have to find the
bad drive and remove it by hand. All this stuff about cold spares
automatically replacing and USARS never notice, is largely a fantasy.

Neither observation leads me to want TLER.

however observations like this ``why did my disks suddenly slow
down?'''' lead me to avoid WD drives period, for ZFS or not ZFS
or
anything at all. Whipping up all this marketing sillyness around TLER
also leads me to avoid them because I know they will shovel bullshit
and FUD to justify jacked prices.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100909/1663ce1d/attachment.bin>

Fei Xu

2010-Sep-10 00:55 UTC

head link

[zfs-discuss] performance leakage when copy huge data

Just to update the status and findings.

I''ve checked TLER settings and they are off by default.

I moved the source pool to another chassis and do the 3.8TB send again.  this
time, not any problems!  the difference is
1. New chassis
2. BIGGER memory.  32GB v.s 12GB
3. although wdidle time is disabled by default, I''ve change the HD mode
from silent to performance in HDtune.  this is what I once heard from some
website that might also fix the disk head park/unpark issue (aka, C1).

seems TLER is not the root cause or at least, set to off is ok.

my next step will be 
1. move back HD to see if it''s the "performance mode" fix the
issue
2. if not, add more memory and try again.

by the way, in HDtune, I saw C7: Ultra DMA CRC error count is a little high
which indicates a potential connection issue.  Maybe all are caused by the
enclosure?
-- 
This message posted from opensolaris.org

Richard Elling

2010-Sep-11 00:52 UTC

head link

[zfs-discuss] performance leakage when copy huge data

On Sep 9, 2010, at 5:55 PM, Fei Xu wrote:> Just to update the status and findings.
Thanks for the update.
> I''ve checked TLER settings and they are off by default.
> 
> I moved the source pool to another chassis and do the 3.8TB send again. 
this time, not any problems!  the difference is
> 1. New chassis
Can you describe the old and new chassis in detail?  Model numbers?
> 2. BIGGER memory.  32GB v.s 12GB
It is not a memory issue.
> 3. although wdidle time is disabled by default, I''ve change the HD
mode from silent to performance in HDtune.  this is what I once heard from some
website that might also fix the disk head park/unpark issue (aka, C1).
Not a bad idea.
> seems TLER is not the root cause or at least, set to off is ok.
Definitely not a TLER issue.
> my next step will be 
> 1. move back HD to see if it''s the "performance mode"
fix the issue
> 2. if not, add more memory and try again.
It is not a memory issue.
> by the way, in HDtune, I saw C7: Ultra DMA CRC error count is a little high
which indicates a potential connection issue.  Maybe all are caused by the
enclosure?
Bingo!
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
richard at nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com

Fei Xu

2010-Sep-11 01:42 UTC

head link

[zfs-discuss] performance leakage when copy huge data

> > by the way, in HDtune, I saw C7: Ultra DMA CRC
> error count is a little high which indicates a
> potential connection issue.  Maybe all are caused by
> the enclosure?
> 
> Bingo!

You are right, I''ve done a lot of tests and the defect is narrorw down
the  "problem hardware".  The two pool works fine in one chassis but
after moved to original enclosure, it just failed when CP or ZFS send.  I also
noticed when the machine bootup reading ZFS configure, there is a warning
message.

" REading ZFS config:" *Warning" /pci at 0,0/pci8086,340f at
8/pci15d9,1 at 0(mpt0):
Discovery in progress, can''t verify IO unit config.

I did search a lot but cannot find more details.  
my 2 server configuration:
1. "PRoblem chassis"   supermicro SuperChassis847e2.  Tysonberg MB
with onboard LSI 1068e (IT mode, which direct expose HD to system without RAID),
Single Xeon5520.
2. "Good Chassis":Self-developed chassis by other department.  S5000WB
MB, single E5504, 2 PCIe-4x LSI 3081 HBA card.

Seems the SAS cable are all connecting right.  I suspect the issue of onboard
1068e and moving the LSI3081 card to the "problem" server to test.
-- 
This message posted from opensolaris.org

zfs discuss - Sep 2010 - performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data

[zfs-discuss] performance leakage when copy huge data